A rule-based approach for tagging non-vocalized Arabic words

Joint Authors

al-Taani, Ahmad T.
Abu al-Rubb, Salah

Source

The International Arab Journal of Information Technology

Issue

Vol. 6, Issue 3 (31 Jul. 2009), pp.320-328, 9 p.

Publisher

Zarqa University

Publication Date

2009-07-31

Country of Publication

Jordan

No. of Pages

9

Main Subjects

Information Technology and Computer Science

Topics

Abstract EN

In this work, we present a tagging system which classifies the words in a non-vocalized Arabic text to their tags.

The proposed tagging system passes through three levels of analysis.

The first level is a lexical analyzer that composed of a lexicon containing all fixed words and particles such as prepositions and pronouns.

The second level is a morphological analyzer which relies on word structure using patterns and affixes to determine word class.

The third level is a syntax analyzer or a grammatical tagging which relies on the process of assigning grammatical tags to words based on their context or the position of the word in the sentence.

The syntax analyzer level consists of two stages : the first stage depends on specific keywords that inform the tag of the successive word, the second stage is the reversed parsing technique which scans the available grammars of Arabic language to get the class of a single ambiguity word in the sentence.

We have tested the proposed system on a corpus consists of 2355 words.

Experimental results showed that the proposed system achieved a rate of success approaching 94% of the total number of words in the sample used in the study.

American Psychological Association (APA)

al-Taani, Ahmad T.& Abu al-Rubb, Salah. 2009. A rule-based approach for tagging non-vocalized Arabic words. The International Arab Journal of Information Technology،Vol. 6, no. 3, pp.320-328.
https://search.emarefa.net/detail/BIM-10441

Modern Language Association (MLA)

al-Taani, Ahmad T.& Abu al-Rubb, Salah. A rule-based approach for tagging non-vocalized Arabic words. The International Arab Journal of Information Technology Vol. 6, no. 3 (Jul. 2009), pp.320-328.
https://search.emarefa.net/detail/BIM-10441

American Medical Association (AMA)

al-Taani, Ahmad T.& Abu al-Rubb, Salah. A rule-based approach for tagging non-vocalized Arabic words. The International Arab Journal of Information Technology. 2009. Vol. 6, no. 3, pp.320-328.
https://search.emarefa.net/detail/BIM-10441

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 326-327

Record ID

BIM-10441