Arabic text classification using k-nearest neighbour algorithm

Joint Authors

Alhutaish, Roiss
Umar, Nazlia

Source

The International Arab Journal of Information Technology

Issue

Vol. 12, Issue 2 (31 Mar. 2015)6 p.

Publisher

Zarqa University

Publication Date

2015-03-31

Country of Publication

Jordan

No. of Pages

6

Main Subjects

Information Technology and Computer Science

Topics

Abstract EN

Many algorithms have been implemented to the problem of Automatic Text Categorization (ATC).

Most of the work in this area has been carried out on English texts, with only a few researchers addressing Arabic texts.

We have investigated the use of the K-NN classifier, with anInew, Cosine, Jaccard, and Dice similarities, in order to enhance Arabic ATC.

We represent the dataset as unstemmed and stemmed data; with the use of TREC-2002, in order to remove prefixes and suffixes.

However, for statistical text representation, Bag-Of-Words (BOW) and character-level 3 (3-Gram) were used.

In order to reduce the dimensionality of feature space, we used several feature selection methods.

Experiments conducted with Arabic text showed that the K-NN classifier, with the new method similarity (Inew) 92.6 % Macro-F1, had better performance than the KNN classifier with Cosine, Jaccard, and Dice similarities.

Chi-Square feature selection, with representation by Bag-Of-Words (BOW), led to the best performance over other feature selection methods using BOW and 3-Gram.

American Psychological Association (APA)

Alhutaish, Roiss& Umar, Nazlia. 2015. Arabic text classification using k-nearest neighbour algorithm. The International Arab Journal of Information Technology،Vol. 12, no. 2.
https://search.emarefa.net/detail/BIM-368792

Modern Language Association (MLA)

Alhutaish, Roiss& Umar, Nazlia. Arabic text classification using k-nearest neighbour algorithm. The International Arab Journal of Information Technology Vol. 12, no. 2 (Mar. 2015).
https://search.emarefa.net/detail/BIM-368792

American Medical Association (AMA)

Alhutaish, Roiss& Umar, Nazlia. Arabic text classification using k-nearest neighbour algorithm. The International Arab Journal of Information Technology. 2015. Vol. 12, no. 2.
https://search.emarefa.net/detail/BIM-368792

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references.

Record ID

BIM-368792