The impact of text preprocessing and term weighting on Arabic text classification
Other Title(s)
أثر معالجة النصوص و توزين الكلمات على تصنيف النصوص العربية
Dissertant
Thesis advisor
Comitee Members
Abu Haybah, Ibrahim S. I.
al-Halis, Ala Mustafa
University
Islamic University
Faculty
Faculty of Engineering
Department
Department of Computer Engineering
University Country
Palestine (Gaza Strip)
Degree
Master
Degree Date
2010
English Abstract
This research presents and compares the impact of text preprocessing, which has not been addressed before, on Arabic text classification using popular text classification algorithms; Decision Tree, K Nearest Neighbors, Support Vector Machines, Naïve Bayes and its variations.
Text preprocessing includes applying different term weighting schemes, and Arabic morphological analysis (stemming and light stemming).
We implemented and integrated Arabic morphological analysis tools within the leading open source machine learning tools : Weka, and RapidMiner.
Text Classification algorithms are applied on seven Arabic corpora (3 in-house collected and 4 existing corpora).
Experimental results show : (1) Light stemming with term pruning is best feature reduction technique.
(2) Support Vector Machines and Naïve Bayes variations outperform other algorithms.
(3) Weighting schemes impact the performance of distance based classifier.
Main Subjects
Information Technology and Computer Science
Topics
No. of Pages
100
Table of Contents
Table of contents.
Abstract.
Chapter 1 : Introduction.
Chapter 2 : Related work.
Chapter 3 : Text classifiers.
Chapter 4 : Text preprocessing.
Chapter 5 : Corpora.
Chapter 6 : Experimental results and analysis.
Chapter 7 : Conclusion and future work.
American Psychological Association (APA)
Sad, Mutazz Khalid. (2010). The impact of text preprocessing and term weighting on Arabic text classification. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-300841
Modern Language Association (MLA)
Sad, Mutazz Khalid. The impact of text preprocessing and term weighting on Arabic text classification. (Master's theses Theses and Dissertations Master). Islamic University. (2010).
https://search.emarefa.net/detail/BIM-300841
American Medical Association (AMA)
Sad, Mutazz Khalid. (2010). The impact of text preprocessing and term weighting on Arabic text classification. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-300841
Language
English
Data Type
Arab Theses
Record ID
BIM-300841