Document classification method based on contents using an improved multinomial naïve Bayes model

العناوين الأخرى

طريقة تصنيف الوثيقة استنادا إلى محتوياتها باستخدام تحسين نموذج متعدد الحدود نيف بايز

مقدم أطروحة جامعية

al-Bayati, Junaina Jamil Najm al-Din

مشرف أطروحة جامعية

al-Husayni, Muhammad Abbas Fadil

أعضاء اللجنة

al-Jarrah, Muzaffar
Kanan, Ghassan Ghazi

الجامعة

جامعة الشرق الأوسط

الكلية

كلية تكنولوجيا المعلومات

القسم الأكاديمي

قسم علم الحاسوب

دولة الجامعة

الأردن

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2015

الملخص الإنجليزي

Currently, there are a lot of Arabic documents that are available in the most of applications in our lives, these Arabic documents have to be systematized and categorized according to a particular topic to be more expressive and more employed, the text classification was one of the approaches that used to arranged the Arabic documents, where the classifications of the Arabic documents were the technique to determine for which topic this text is related to, numerous studies were accompanied about this discipline to increase the performance of the document classification particularly the Arabic document, the Arabic linguistic is treasure and an actual complex inflectional language that changes the modest and normal approaches to difficult one .

This research involved in improving and promoting the performance of the multinomial naive Bayes (MNB) classification by using three different approaches; at first by addition only the n-gram, the another one by applied the TF-IDF, and lastly by using both of n-gram and TF-IDF, then these improved classifiers had been evaluated based on the estimated values of the recall, precision and F-measure for each classifier next to apply it over the Arabic data set that covers six classes which involved about 1500 arabic document dissimilar document.

The average of F-measure for all classes when applying the bigram was (81.46%), while the average of F-measure for all classes when applying TF-IDF was (88.88%) and the average of F-measure for all classes when applying the combination of both bigram and TF-IDF was (89.70%).

The variance F-measure between different suggested classifiers verified that the classifier which is enhanced by using both of the TF-IDF and bigram accomplished the highest values and it characterizes as the most effective classifier between the three suggested classifier.

In the second stage of effectiveness, the classifier that enhanced by using only TF-IDF and finally the classifier which enhanced by using only the bigram.

Keywords: Multinomial Naïve Bayes, TF-IDF(Term Frequency-Inverse Document Frequency), N-gram , Data Set Arabic, Tokenization, Stemming, Remove Stop Words .

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

عدد الصفحات

74

قائمة المحتويات

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Literature review.

Chapter Three : Methodology and proposed models.

Chapter Four : The results.

Chapter Five : The evaluation.

Chapter Six : Conclusion and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Bayati, Junaina Jamil Najm al-Din. (2015). Document classification method based on contents using an improved multinomial naïve Bayes model. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-698776

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Bayati, Junaina Jamil Najm al-Din. Document classification method based on contents using an improved multinomial naïve Bayes model. (Master's theses Theses and Dissertations Master). Middle East University. (2015).
https://search.emarefa.net/detail/BIM-698776

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Bayati, Junaina Jamil Najm al-Din. (2015). Document classification method based on contents using an improved multinomial naïve Bayes model. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-698776

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-698776