Automatic text classification : a comparative study
Joint Authors
Najadat, Hassan M.
Khalil, Muhammad Ibrahim
Hamidi, Ismail I.
Source
International Computer Sciences and Informatics Conference, Amman, Jordan 12-13 January 2016.
Publisher
Publication Date
2016-01-31
Country of Publication
Jordan
No. of Pages
11
Main Subjects
English Abstract
The massive amount of semi-structured data contained within the text documents makes the process of classifying them manually a very difficult task.
Automatic text classification is the process of classifying documents based on their contents into a predefined set of categories.
This paper provides a comparison of the performance of well-known text classification techniques including genetic algorithm, k nearest neighbour, decision tree, support Vctor machine and naïve Bayes.
Light stemmer and Chi method have been implemented as preprocessing and features selection techniques.
The effectiveness of the classifiers is evaluated in terms of macro-average F1 measure.
In order to evaluate the five classification techniques, a text corpus has been collected.
Results showed that the performance of the support vector machine and the naïve Bayes classifiers outperforms the other classifiers in term of the classification accuracy.
Data Type
Conference Papers
Record ID
BIM-767298
American Psychological Association (APA)
Hamidi, Ismail I.& Khalil, Muhammad Ibrahim& Najadat, Hassan M.. 2016-01-31. Automatic text classification : a comparative study. . , pp.179-189.Amman Jordan : Amman Arab University.
https://search.emarefa.net/detail/BIM-767298
Modern Language Association (MLA)
Hamidi, Ismail I.…[et al.]. Automatic text classification : a comparative study. . Amman Jordan : Amman Arab University. 2016-01-31.
https://search.emarefa.net/detail/BIM-767298
American Medical Association (AMA)
Hamidi, Ismail I.& Khalil, Muhammad Ibrahim& Najadat, Hassan M.. Automatic text classification : a comparative study. .
https://search.emarefa.net/detail/BIM-767298