Document classification method based on contents using an improved multinomial naïve Bayes model
Other Title(s)
طريقة تصنيف الوثيقة استنادا إلى محتوياتها باستخدام تحسين نموذج متعدد الحدود نيف بايز
Dissertant
al-Bayati, Junaina Jamil Najm al-Din
Thesis advisor
al-Husayni, Muhammad Abbas Fadil
Comitee Members
al-Jarrah, Muzaffar
Kanan, Ghassan Ghazi
University
Middle East University
Faculty
Faculty of Information Technology
Department
Computer Science Department
University Country
Jordan
Degree
Master
Degree Date
2015
English Abstract
Currently, there are a lot of Arabic documents that are available in the most of applications in our lives, these Arabic documents have to be systematized and categorized according to a particular topic to be more expressive and more employed, the text classification was one of the approaches that used to arranged the Arabic documents, where the classifications of the Arabic documents were the technique to determine for which topic this text is related to, numerous studies were accompanied about this discipline to increase the performance of the document classification particularly the Arabic document, the Arabic linguistic is treasure and an actual complex inflectional language that changes the modest and normal approaches to difficult one .
This research involved in improving and promoting the performance of the multinomial naive Bayes (MNB) classification by using three different approaches; at first by addition only the n-gram, the another one by applied the TF-IDF, and lastly by using both of n-gram and TF-IDF, then these improved classifiers had been evaluated based on the estimated values of the recall, precision and F-measure for each classifier next to apply it over the Arabic data set that covers six classes which involved about 1500 arabic document dissimilar document.
The average of F-measure for all classes when applying the bigram was (81.46%), while the average of F-measure for all classes when applying TF-IDF was (88.88%) and the average of F-measure for all classes when applying the combination of both bigram and TF-IDF was (89.70%).
The variance F-measure between different suggested classifiers verified that the classifier which is enhanced by using both of the TF-IDF and bigram accomplished the highest values and it characterizes as the most effective classifier between the three suggested classifier.
In the second stage of effectiveness, the classifier that enhanced by using only TF-IDF and finally the classifier which enhanced by using only the bigram.
Keywords: Multinomial Naïve Bayes, TF-IDF(Term Frequency-Inverse Document Frequency), N-gram , Data Set Arabic, Tokenization, Stemming, Remove Stop Words .
Main Subjects
Information Technology and Computer Science
No. of Pages
74
Table of Contents
Table of contents.
Abstract.
Abstract in Arabic.
Chapter One : Introduction.
Chapter Two : Literature review.
Chapter Three : Methodology and proposed models.
Chapter Four : The results.
Chapter Five : The evaluation.
Chapter Six : Conclusion and future work.
References.
American Psychological Association (APA)
al-Bayati, Junaina Jamil Najm al-Din. (2015). Document classification method based on contents using an improved multinomial naïve Bayes model. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-698776
Modern Language Association (MLA)
al-Bayati, Junaina Jamil Najm al-Din. Document classification method based on contents using an improved multinomial naïve Bayes model. (Master's theses Theses and Dissertations Master). Middle East University. (2015).
https://search.emarefa.net/detail/BIM-698776
American Medical Association (AMA)
al-Bayati, Junaina Jamil Najm al-Din. (2015). Document classification method based on contents using an improved multinomial naïve Bayes model. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-698776
Language
English
Data Type
Arab Theses
Record ID
BIM-698776