Automatic text classification : a comparative study

Publication Date

2016-01-31

Country of Publication

Jordan

No. of Pages

Main Subjects

Electronic engineering

English Abstract

The massive amount of semi-structured data contained within the text documents makes the process of classifying them manually a very difficult task.

Automatic text classification is the process of classifying documents based on their contents into a predefined set of categories.

This paper provides a comparison of the performance of well-known text classification techniques including genetic algorithm, k nearest neighbour, decision tree, support Vctor machine and naïve Bayes.

Light stemmer and Chi method have been implemented as preprocessing and features selection techniques.

The effectiveness of the classifiers is evaluated in terms of macro-average F1 measure.

In order to evaluate the five classification techniques, a text corpus has been collected.

Results showed that the performance of the support vector machine and the naïve Bayes classifiers outperforms the other classifiers in term of the classification accuracy.

Data Type

Conference Papers

Record ID

BIM-767298

American Psychological Association (APA)

Hamidi, Ismail I.& Khalil, Muhammad Ibrahim& Najadat, Hassan M.. 2016-01-31. Automatic text classification : a comparative study. . , pp.179-189.Amman Jordan : Amman Arab University.
https://search.emarefa.net/detail/BIM-767298

Modern Language Association (MLA)

Hamidi, Ismail I.…[et al.]. Automatic text classification : a comparative study. . Amman Jordan : Amman Arab University. 2016-01-31.
https://search.emarefa.net/detail/BIM-767298

American Medical Association (AMA)

Hamidi, Ismail I.& Khalil, Muhammad Ibrahim& Najadat, Hassan M.. Automatic text classification : a comparative study. .
https://search.emarefa.net/detail/BIM-767298