Arabic text classification using k-nearest neighbour algorithm
Joint Authors
Source
The International Arab Journal of Information Technology
Issue
Vol. 12, Issue 2 (31 Mar. 2015)6 p.
Publisher
Publication Date
2015-03-31
Country of Publication
Jordan
No. of Pages
6
Main Subjects
Information Technology and Computer Science
Topics
Abstract EN
Many algorithms have been implemented to the problem of Automatic Text Categorization (ATC).
Most of the work in this area has been carried out on English texts, with only a few researchers addressing Arabic texts.
We have investigated the use of the K-NN classifier, with anInew, Cosine, Jaccard, and Dice similarities, in order to enhance Arabic ATC.
We represent the dataset as unstemmed and stemmed data; with the use of TREC-2002, in order to remove prefixes and suffixes.
However, for statistical text representation, Bag-Of-Words (BOW) and character-level 3 (3-Gram) were used.
In order to reduce the dimensionality of feature space, we used several feature selection methods.
Experiments conducted with Arabic text showed that the K-NN classifier, with the new method similarity (Inew) 92.6 % Macro-F1, had better performance than the KNN classifier with Cosine, Jaccard, and Dice similarities.
Chi-Square feature selection, with representation by Bag-Of-Words (BOW), led to the best performance over other feature selection methods using BOW and 3-Gram.
American Psychological Association (APA)
Alhutaish, Roiss& Umar, Nazlia. 2015. Arabic text classification using k-nearest neighbour algorithm. The International Arab Journal of Information Technology،Vol. 12, no. 2.
https://search.emarefa.net/detail/BIM-368792
Modern Language Association (MLA)
Alhutaish, Roiss& Umar, Nazlia. Arabic text classification using k-nearest neighbour algorithm. The International Arab Journal of Information Technology Vol. 12, no. 2 (Mar. 2015).
https://search.emarefa.net/detail/BIM-368792
American Medical Association (AMA)
Alhutaish, Roiss& Umar, Nazlia. Arabic text classification using k-nearest neighbour algorithm. The International Arab Journal of Information Technology. 2015. Vol. 12, no. 2.
https://search.emarefa.net/detail/BIM-368792
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references.
Record ID
BIM-368792