A markovian approach for Arabic root extraction

Joint Authors

Boudlal, Abd al-Rahman
Belahbib, Rashid
Lakhouaja, Abd al-Haqq
Mazroui, Izz al-Din
Mizyan, Abd al-ouafi
Bebah, Muhammad

Source

The International Arab Journal of Information Technology

Issue

Vol. 8, Issue 1 (31 Jan. 2011), pp.91-98, 8 p.

Publisher

Zarqa University

Publication Date

2011-01-31

Country of Publication

Jordan

No. of Pages

8

Main Subjects

Information Technology and Computer Science

Topics

Abstract EN

In this paper, we present an Arabic morphological analysis system that assigns, for each word of an unvoweled Arabic sentence, a unique root depending on the context.

The proposed system is composed of two modules.

The first one consists of an analysis out of context.

In this module, we segment each word of the sentence into its elementary morphological units in order to identify its possible roots.

For that, we adopt the segmentation of the word into three parts (prefix, stem, suffix).

In the second module we use the context to identify the correct root among all the possible roots of the word.

For this purpose, we use a Hidden Markov Models approach, where the observations are the words and the possible roots represent the hidden states.

We validate the approach using the NEMLAR Arabic writing corpus consisting of 500,000 words.

The system gives the correct root in more than 98% of the training set, and in almost 94% of the words in the testing set.

American Psychological Association (APA)

Boudlal, Abd al-Rahman& Belahbib, Rashid& Lakhouaja, Abd al-Haqq& Mazroui, Izz al-Din& Mizyan, Abd al-ouafi& Bebah, Muhammad. 2011. A markovian approach for Arabic root extraction. The International Arab Journal of Information Technology،Vol. 8, no. 1, pp.91-98.
https://search.emarefa.net/detail/BIM-244543

Modern Language Association (MLA)

Boudlal, Abd al-Rahman…[et al.]. A markovian approach for Arabic root extraction. The International Arab Journal of Information Technology Vol. 8, no. 1 (Jan. 2011), pp.91-98.
https://search.emarefa.net/detail/BIM-244543

American Medical Association (AMA)

Boudlal, Abd al-Rahman& Belahbib, Rashid& Lakhouaja, Abd al-Haqq& Mazroui, Izz al-Din& Mizyan, Abd al-ouafi& Bebah, Muhammad. A markovian approach for Arabic root extraction. The International Arab Journal of Information Technology. 2011. Vol. 8, no. 1, pp.91-98.
https://search.emarefa.net/detail/BIM-244543

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 96-98

Record ID

BIM-244543