Arabic light stemmer based on regular expression technique

Joint Authors

al-Badarinah, Mutasim
Mahyub, Nizar
Shihab, Muhammad A.
al-Shalabi, Riyad
Kanan, Ghassan

Source

International Computer Sciences and Informatics Conference, Amman, Jordan 12-13 January 2016.

Publisher

Amman Arab University

Publication Date

2016-01-31

Country of Publication

Jordan

No. of Pages

9

Main Subjects

Information Technology and Computer Science

English Abstract

Arabic light stemmer removes affixes from any word as well as stop words.

It is considered as a text pre-processing task for many Natural Language Processing (NLP) applications such as text categorization, information retrieval, opinion mining, etc..

Many Arabic light stemmers were presented are depend on several techniques like the grammar-based, patterns-based, and mathematical rules-based.

In this paper, a new Arabic stemmer based on regular expressions is proposed, where it relies on regular expression to check out if the inputted word related to its text pattern or not.

This stemmer is designed with two modes : (i) using only the proposed regular expression methodology, while (ii) hiring the Microsoft Word dictionary in addition to the proposed stemmer.

The proposed methods achieved remarkable results that vary between 73.3% and 79.6% accuracy.

Data Type

Conference Papers

Record ID

BIM-767331

American Psychological Association (APA)

al-Shalabi, Riyad& Kanan, Ghassan& Shihab, Muhammad A.& al-Badarinah, Mutasim& Mahyub, Nizar. 2016-01-31. Arabic light stemmer based on regular expression technique. . , pp.295-303.Amman Jordan : Amman Arab University.
https://search.emarefa.net/detail/BIM-767331

Modern Language Association (MLA)

al-Shalabi, Riyad…[et al.]. Arabic light stemmer based on regular expression technique. . Amman Jordan : Amman Arab University. 2016-01-31.
https://search.emarefa.net/detail/BIM-767331

American Medical Association (AMA)

al-Shalabi, Riyad& Kanan, Ghassan& Shihab, Muhammad A.& al-Badarinah, Mutasim& Mahyub, Nizar. Arabic light stemmer based on regular expression technique. .
https://search.emarefa.net/detail/BIM-767331