Arabic text categorization

Publication Date

2007-04-30

Country of Publication

Jordan

No. of Pages

Main Subjects

Information Technology and Computer Science

Topics

Abstract EN

In this paper, we compare the performance of three classifiers for Arabic text categorization.

In particular, the naïve Bayes, k-nearest-neighbors (knn), and distance-based classifiers were used.

Unclassified documents were preprocessed by removing punctuation marks and stop words.

Each document is then represented as a vector of words (or of words and their frequencies as in the case of the naïve Bayes classifier).

Stemming was used to reduce the dimensionality of feature vectors of documents.

The accuracy of the classifiers is compared using recall, precision, error rate and fallout.

The results of the experimentations that were carried out on an in-house collected Arabic text show that the naïve Bayes classifier outperforms the other two.

American Psychological Association (APA)

al-Duwayri, Rehab. 2007. Arabic text categorization. The International Arab Journal of Information Technology،Vol. 4, no. 2, pp.125-131.
https://search.emarefa.net/detail/BIM-11633

Modern Language Association (MLA)

al-Duwayri, Rehab. Arabic text categorization. The International Arab Journal of Information Technology Vol. 4, no. 2 (Apr. 2007), pp.125-131.
https://search.emarefa.net/detail/BIM-11633

American Medical Association (AMA)

al-Duwayri, Rehab. Arabic text categorization. The International Arab Journal of Information Technology. 2007. Vol. 4, no. 2, pp.125-131.
https://search.emarefa.net/detail/BIM-11633

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 129-130

Record ID

BIM-11633

SaveSaved Print

Arab Citation & Impact Factor "Arcif"

Largest Arabic Database of Citations Analysis for the Arabic Scholarly Journals Issued in Arab World.

eMarefa Indicators
for Arab Scientific Production

"Kashif" for Checking Similarity or Plagiarism in the Arabic Researches. know more