Enhancing the accuracy of Sonbol’s Arabic root extraction algorithm

Joint Authors

Hanin Zahri, Nik Adilah
Thalji, Nisrin
Thalji, Ziyad
al-Hakim, Suhayr

Source

Jordanian Journal of Computetrs and Information Technology

Issue

Vol. 4, Issue 3 (31 Dec. 2018), pp.159-174, 16 p.

Publisher

Princess Sumaya University for Technology

Publication Date

2018-12-31

Country of Publication

Jordan

No. of Pages

16

Main Subjects

Arabic language and Literature

Abstract EN

Root extraction is an important primary process in most Arabic applications, such as information retrieval systems, text mining, text classifiers, question answering systems, data compression, indexes, spelling checkers, text summarization and machine translation.

any weaknesses of root extraction will affect negatively the performance of these applications.

Sonbol’s Arabic root extraction algorithm achieves high accuracy of performance and gives new classification for Arabic’s letters which minimizes the affix ambiguity.

the comparison and testing of the existing Arabic root extraction algorithms on unify datasets shows that they still need some enhancements.

Arabic root extraction is mainly based on using patterns, where as much as the algorithm has patterns as much as the accuracy is better.

in this study, we improve Sonbol’s Arabic root extraction algorithm, by enhancing its rules and increasing its patterns.

We use 4320 patterns to extract the roots, which is the largest patterns’ list extracted by Thalji’s corpus.

We test the new algorithm on Thalji’s corpus that contains 720,000 word-root pairs.

this corpus is mainly built to test and compare Arabic root extraction algorithms.

the new algorithm is compared with Sonbol’s Arabic root extraction algorithm.

the algorithm of Sonbol et al.

achieves an accuracy of 68%, whereas the new algorithm achieves an accuracy of 92%.

American Psychological Association (APA)

Thalji, Nisrin& Hanin Zahri, Nik Adilah& Thalji, Ziyad& al-Hakim, Suhayr. 2018. Enhancing the accuracy of Sonbol’s Arabic root extraction algorithm. Jordanian Journal of Computetrs and Information Technology،Vol. 4, no. 3, pp.159-174.
https://search.emarefa.net/detail/BIM-1415329

Modern Language Association (MLA)

Thalji, Nisrin…[et al.]. Enhancing the accuracy of Sonbol’s Arabic root extraction algorithm. Jordanian Journal of Computetrs and Information Technology Vol. 4, no. 3 (Dec. 2018), pp.159-174.
https://search.emarefa.net/detail/BIM-1415329

American Medical Association (AMA)

Thalji, Nisrin& Hanin Zahri, Nik Adilah& Thalji, Ziyad& al-Hakim, Suhayr. Enhancing the accuracy of Sonbol’s Arabic root extraction algorithm. Jordanian Journal of Computetrs and Information Technology. 2018. Vol. 4, no. 3, pp.159-174.
https://search.emarefa.net/detail/BIM-1415329

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 168-169

Record ID

BIM-1415329