Enhancing the accuracy of Sonbol’s Arabic root extraction algorithm
Joint Authors
Hanin Zahri, Nik Adilah
Thalji, Nisrin
Thalji, Ziyad
al-Hakim, Suhayr
Source
Jordanian Journal of Computetrs and Information Technology
Issue
Vol. 4, Issue 3 (31 Dec. 2018), pp.159-174, 16 p.
Publisher
Princess Sumaya University for Technology
Publication Date
2018-12-31
Country of Publication
Jordan
No. of Pages
16
Main Subjects
Arabic language and Literature
Abstract EN
Root extraction is an important primary process in most Arabic applications, such as information retrieval systems, text mining, text classifiers, question answering systems, data compression, indexes, spelling checkers, text summarization and machine translation.
any weaknesses of root extraction will affect negatively the performance of these applications.
Sonbol’s Arabic root extraction algorithm achieves high accuracy of performance and gives new classification for Arabic’s letters which minimizes the affix ambiguity.
the comparison and testing of the existing Arabic root extraction algorithms on unify datasets shows that they still need some enhancements.
Arabic root extraction is mainly based on using patterns, where as much as the algorithm has patterns as much as the accuracy is better.
in this study, we improve Sonbol’s Arabic root extraction algorithm, by enhancing its rules and increasing its patterns.
We use 4320 patterns to extract the roots, which is the largest patterns’ list extracted by Thalji’s corpus.
We test the new algorithm on Thalji’s corpus that contains 720,000 word-root pairs.
this corpus is mainly built to test and compare Arabic root extraction algorithms.
the new algorithm is compared with Sonbol’s Arabic root extraction algorithm.
the algorithm of Sonbol et al.
achieves an accuracy of 68%, whereas the new algorithm achieves an accuracy of 92%.
American Psychological Association (APA)
Thalji, Nisrin& Hanin Zahri, Nik Adilah& Thalji, Ziyad& al-Hakim, Suhayr. 2018. Enhancing the accuracy of Sonbol’s Arabic root extraction algorithm. Jordanian Journal of Computetrs and Information Technology،Vol. 4, no. 3, pp.159-174.
https://search.emarefa.net/detail/BIM-1415329
Modern Language Association (MLA)
Thalji, Nisrin…[et al.]. Enhancing the accuracy of Sonbol’s Arabic root extraction algorithm. Jordanian Journal of Computetrs and Information Technology Vol. 4, no. 3 (Dec. 2018), pp.159-174.
https://search.emarefa.net/detail/BIM-1415329
American Medical Association (AMA)
Thalji, Nisrin& Hanin Zahri, Nik Adilah& Thalji, Ziyad& al-Hakim, Suhayr. Enhancing the accuracy of Sonbol’s Arabic root extraction algorithm. Jordanian Journal of Computetrs and Information Technology. 2018. Vol. 4, no. 3, pp.159-174.
https://search.emarefa.net/detail/BIM-1415329
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references : p. 168-169
Record ID
BIM-1415329