An Arabic lemma-based stemmer for latent topic modeling

Joint Authors

Benyettou, Abd al-Qadir
Brahmi, Abd al-Razzaq
al-Sharif, Ahmad T.

Source

The International Arab Journal of Information Technology

Issue

Vol. 10, Issue 2 (31 Mar. 2013)9 p.

Publisher

Zarqa University

Publication Date

2013-03-31

Country of Publication

Jordan

No. of Pages

9

Main Subjects

Languages & Comparative Literature

Topics

Abstract EN

Development in Arabic information retrieval did not follow the increasing use of the Arabic Web during the last decade.

Semantic indexing in a language with high inflectional morphology, such as Arabic, is not a trivial task and requires a text analysis in the original language.

Excepting cross-language retrieval methods or limited studies, the main efforts, for developing semantic analysis methods and topic modeling, did not include Arabic text.

This paper describes our approach for analyzing semantics in Arabic texts.

A new lemma-based stemmer is developed and compared to root-based one for characterizing Arabic text.

The Latent Dirichlet Allocation (LDA) model is adapted to extract Arabic latent topics from various real-world corpora.

In addition to the interesting subjects discovered in the press articles during the 2007-2009 period, experiments show that the classification performances with lemma-based stemming in the topics space, are improved when comparing to classification with root-based stemming.

American Psychological Association (APA)

Brahmi, Abd al-Razzaq& al-Sharif, Ahmad T.& Benyettou, Abd al-Qadir. 2013. An Arabic lemma-based stemmer for latent topic modeling. The International Arab Journal of Information Technology،Vol. 10, no. 2.
https://search.emarefa.net/detail/BIM-311948

Modern Language Association (MLA)

Benyettou, Abd al-Qadir…[et al.]. An Arabic lemma-based stemmer for latent topic modeling. The International Arab Journal of Information Technology Vol. 10, no. 2 (Mar. 2013).
https://search.emarefa.net/detail/BIM-311948

American Medical Association (AMA)

Brahmi, Abd al-Razzaq& al-Sharif, Ahmad T.& Benyettou, Abd al-Qadir. An Arabic lemma-based stemmer for latent topic modeling. The International Arab Journal of Information Technology. 2013. Vol. 10, no. 2.
https://search.emarefa.net/detail/BIM-311948

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references.

Record ID

BIM-311948