Enhancing the accuracy of Sonbol’s Arabic root extraction algorithm

المؤلفون المشاركون

Hanin Zahri, Nik Adilah
Thalji, Nisrin
Thalji, Ziyad
al-Hakim, Suhayr

المصدر

Jordanian Journal of Computetrs and Information Technology

العدد

المجلد 4، العدد 3 (31 ديسمبر/كانون الأول 2018)، ص ص. 159-174، 16ص.

الناشر

جامعة الأميرة سمية للتكنولوجيا

تاريخ النشر

2018-12-31

دولة النشر

الأردن

عدد الصفحات

16

التخصصات الرئيسية

اللغة العربية وآدابها

الملخص EN

Root extraction is an important primary process in most Arabic applications, such as information retrieval systems, text mining, text classifiers, question answering systems, data compression, indexes, spelling checkers, text summarization and machine translation.

any weaknesses of root extraction will affect negatively the performance of these applications.

Sonbol’s Arabic root extraction algorithm achieves high accuracy of performance and gives new classification for Arabic’s letters which minimizes the affix ambiguity.

the comparison and testing of the existing Arabic root extraction algorithms on unify datasets shows that they still need some enhancements.

Arabic root extraction is mainly based on using patterns, where as much as the algorithm has patterns as much as the accuracy is better.

in this study, we improve Sonbol’s Arabic root extraction algorithm, by enhancing its rules and increasing its patterns.

We use 4320 patterns to extract the roots, which is the largest patterns’ list extracted by Thalji’s corpus.

We test the new algorithm on Thalji’s corpus that contains 720,000 word-root pairs.

this corpus is mainly built to test and compare Arabic root extraction algorithms.

the new algorithm is compared with Sonbol’s Arabic root extraction algorithm.

the algorithm of Sonbol et al.

achieves an accuracy of 68%, whereas the new algorithm achieves an accuracy of 92%.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Thalji, Nisrin& Hanin Zahri, Nik Adilah& Thalji, Ziyad& al-Hakim, Suhayr. 2018. Enhancing the accuracy of Sonbol’s Arabic root extraction algorithm. Jordanian Journal of Computetrs and Information Technology،Vol. 4, no. 3, pp.159-174.
https://search.emarefa.net/detail/BIM-1415329

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Thalji, Nisrin…[et al.]. Enhancing the accuracy of Sonbol’s Arabic root extraction algorithm. Jordanian Journal of Computetrs and Information Technology Vol. 4, no. 3 (Dec. 2018), pp.159-174.
https://search.emarefa.net/detail/BIM-1415329

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Thalji, Nisrin& Hanin Zahri, Nik Adilah& Thalji, Ziyad& al-Hakim, Suhayr. Enhancing the accuracy of Sonbol’s Arabic root extraction algorithm. Jordanian Journal of Computetrs and Information Technology. 2018. Vol. 4, no. 3, pp.159-174.
https://search.emarefa.net/detail/BIM-1415329

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references : p. 168-169

رقم السجل

BIM-1415329