Arabic text classification techniques using the multivariate Bernoulli model

Other Title(s)

تقنيات تصنيف النصوص العربية باستخدام نموذج متعدد المتغيرات برنولي

Dissertant

al-Arqat, Latifah Faraj

Thesis advisor

Kanan, Ghassan

Comitee Members

al-Dabbas, Umar Ṣuhaib
al-Hamami, Ala Husayn

University

Amman Arab University

Faculty

Collage of Computer Sciences and Informatics

Department

Department of Computer Science

University Country

Jordan

Degree

Master

Degree Date

2013

English Abstract

-Document classification is currently one of the most important areas of information retrieval.

It aims to mapping text documents into one or more predefined class or category based on its contents of keywords.

This research study focuses on problem of Arabic text classification using Naïve Bayes (NB) and Multivariate Bernoulli Models (MBM) NB and MBM classifiers have been compared with K nearest neighbor K-NN and Rocchio classifiers.

Experiments will be conducted by using a corpus that consists of more than 1445 Arabic documents that are classified into nine categories.

The research evaluates these techniques using the standard of recall, precision, and f-measure as the basis of comparison.

The experiments have concluded that the effectiveness of the NB using MBM classifier is very significant.

It outperformed k-NN and Rocchio classifiers.

MBM macro-precision and macro-recalls reached to 0.86 and 0.831 respectively.In general, Naive Bayes algorithm using MBM has outperforms the two classifiers: KNN and Rocchio.

Naive Bayes algorithm using MBM classifier has the best precision.

The Rocchio classifier comes in the second place.

The worst classifier of this data set was k-NN classifier.The results can be slightly better if we increase the number of documents to 5070.

Main Subjects

Information Technology and Computer Science

Topics

No. of Pages

67

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Literature reviews.

Chapter Three : The methodology.

Chapter Four : Experiments and evaluation.

Chapter Five : Conclusion and future work.

References.

American Psychological Association (APA)

al-Arqat, Latifah Faraj. (2013). Arabic text classification techniques using the multivariate Bernoulli model. (Master's theses Theses and Dissertations Master). Amman Arab University, Jordan
https://search.emarefa.net/detail/BIM-529295

Modern Language Association (MLA)

al-Arqat, Latifah Faraj. Arabic text classification techniques using the multivariate Bernoulli model. (Master's theses Theses and Dissertations Master). Amman Arab University. (2013).
https://search.emarefa.net/detail/BIM-529295

American Medical Association (AMA)

al-Arqat, Latifah Faraj. (2013). Arabic text classification techniques using the multivariate Bernoulli model. (Master's theses Theses and Dissertations Master). Amman Arab University, Jordan
https://search.emarefa.net/detail/BIM-529295

Language

English

Data Type

Arab Theses

Record ID

BIM-529295