An intelligent system for arabic text categorization
Joint Authors
Siyam, M. M.
Fayid, Z. T.
Habib, M. B.
Source
International Journal of Intelligent Computing and Information Sciences
Issue
Vol. 6, Issue 1 (31 Jan. 2006)19 p.
Publisher
Ain Shams University Faculty of Computer and Information Sciences
Publication Date
2006-01-31
Country of Publication
Egypt
No. of Pages
19
Main Subjects
Information Technology and Computer Science
Topics
Abstract EN
Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content.
In this paper, an intelligent Arabic text categorization system is presented.
Machine learning algorithms are used in this system.
Many algorithms for stemming and feature selection are tried.
Moreover, the document is represented using several term weighting schemes and finally the k-nearest neighbor and Rocchio classifiers are used for classification process.
Experiments are performed over self-collected data corpus and the results show that the suggested hybrid method of statistical and light stemmers is the most suitable stemming algorithm for Arabic language.
The results also show that a hybrid approach of document frequency and information gain is the preferable feature selection criterion and normalized-tfidf is the best weighting scheme.
Finally, Rocchio classifier has the advantage over k-nearest neighbor classifier in the classification process.
The experimental results illustrate that the proposed model is an efficient method and gives generalization accuracy of about 98 %.
American Psychological Association (APA)
Siyam, M. M.& Fayid, Z. T.& Habib, M. B.. 2006. An intelligent system for arabic text categorization. International Journal of Intelligent Computing and Information Sciences،Vol. 6, no. 1.
https://search.emarefa.net/detail/BIM-284442
Modern Language Association (MLA)
Siyam, M. M.…[et al.]. An intelligent system for arabic text categorization. International Journal of Intelligent Computing and Information Sciences Vol. 6, no. 1 (Jan. 2006).
https://search.emarefa.net/detail/BIM-284442
American Medical Association (AMA)
Siyam, M. M.& Fayid, Z. T.& Habib, M. B.. An intelligent system for arabic text categorization. International Journal of Intelligent Computing and Information Sciences. 2006. Vol. 6, no. 1.
https://search.emarefa.net/detail/BIM-284442
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references.
Record ID
BIM-284442