Arabic text classification techniques using the multivariate Bernoulli model

العناوين الأخرى

تقنيات تصنيف النصوص العربية باستخدام نموذج متعدد المتغيرات برنولي

مقدم أطروحة جامعية

al-Arqat, Latifah Faraj

مشرف أطروحة جامعية

Kanan, Ghassan

أعضاء اللجنة

al-Dabbas, Umar Ṣuhaib
al-Hamami, Ala Husayn

الجامعة

جامعة عمان العربية

الكلية

كلية العلوم الحاسوبية و المعلوماتية

القسم الأكاديمي

قسم علم الحاسوب

دولة الجامعة

الأردن

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2013

الملخص الإنجليزي

-Document classification is currently one of the most important areas of information retrieval.

It aims to mapping text documents into one or more predefined class or category based on its contents of keywords.

This research study focuses on problem of Arabic text classification using Naïve Bayes (NB) and Multivariate Bernoulli Models (MBM) NB and MBM classifiers have been compared with K nearest neighbor K-NN and Rocchio classifiers.

Experiments will be conducted by using a corpus that consists of more than 1445 Arabic documents that are classified into nine categories.

The research evaluates these techniques using the standard of recall, precision, and f-measure as the basis of comparison.

The experiments have concluded that the effectiveness of the NB using MBM classifier is very significant.

It outperformed k-NN and Rocchio classifiers.

MBM macro-precision and macro-recalls reached to 0.86 and 0.831 respectively.In general, Naive Bayes algorithm using MBM has outperforms the two classifiers: KNN and Rocchio.

Naive Bayes algorithm using MBM classifier has the best precision.

The Rocchio classifier comes in the second place.

The worst classifier of this data set was k-NN classifier.The results can be slightly better if we increase the number of documents to 5070.

التخصصات الرئيسية

الرياضيات

الموضوعات

عدد الصفحات

67

قائمة المحتويات

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Literature reviews.

Chapter Three : The methodology.

Chapter Four : Experiments and evaluation.

Chapter Five : Conclusion and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Arqat, Latifah Faraj. (2013). Arabic text classification techniques using the multivariate Bernoulli model. (Master's theses Theses and Dissertations Master). Amman Arab University, Jordan
https://search.emarefa.net/detail/BIM-529295

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Arqat, Latifah Faraj. Arabic text classification techniques using the multivariate Bernoulli model. (Master's theses Theses and Dissertations Master). Amman Arab University. (2013).
https://search.emarefa.net/detail/BIM-529295

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Arqat, Latifah Faraj. (2013). Arabic text classification techniques using the multivariate Bernoulli model. (Master's theses Theses and Dissertations Master). Amman Arab University, Jordan
https://search.emarefa.net/detail/BIM-529295

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-529295