An intelligent system for arabic text categorization

Joint Authors

Siyam, M. M.
Fayid, Z. T.
Habib, M. B.

Source

International Journal of Intelligent Computing and Information Sciences

Issue

Vol. 6, Issue 1 (31 Jan. 2006)19 p.

Publisher

Ain Shams University Faculty of Computer and Information Sciences

Publication Date

2006-01-31

Country of Publication

Egypt

No. of Pages

19

Main Subjects

Information Technology and Computer Science

Topics

Abstract EN

Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content.

In this paper, an intelligent Arabic text categorization system is presented.

Machine learning algorithms are used in this system.

Many algorithms for stemming and feature selection are tried.

Moreover, the document is represented using several term weighting schemes and finally the k-nearest neighbor and Rocchio classifiers are used for classification process.

Experiments are performed over self-collected data corpus and the results show that the suggested hybrid method of statistical and light stemmers is the most suitable stemming algorithm for Arabic language.

The results also show that a hybrid approach of document frequency and information gain is the preferable feature selection criterion and normalized-tfidf is the best weighting scheme.

Finally, Rocchio classifier has the advantage over k-nearest neighbor classifier in the classification process.

The experimental results illustrate that the proposed model is an efficient method and gives generalization accuracy of about 98 %.

American Psychological Association (APA)

Siyam, M. M.& Fayid, Z. T.& Habib, M. B.. 2006. An intelligent system for arabic text categorization. International Journal of Intelligent Computing and Information Sciences،Vol. 6, no. 1.
https://search.emarefa.net/detail/BIM-284442

Modern Language Association (MLA)

Siyam, M. M.…[et al.]. An intelligent system for arabic text categorization. International Journal of Intelligent Computing and Information Sciences Vol. 6, no. 1 (Jan. 2006).
https://search.emarefa.net/detail/BIM-284442

American Medical Association (AMA)

Siyam, M. M.& Fayid, Z. T.& Habib, M. B.. An intelligent system for arabic text categorization. International Journal of Intelligent Computing and Information Sciences. 2006. Vol. 6, no. 1.
https://search.emarefa.net/detail/BIM-284442

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references.

Record ID

BIM-284442