Automatic Arabic text categorization using efficient classification techniques

Other Title(s)

التصنيف التلقائي للنصوص العربية باستخدام تقنيات التصنيف ذات الكفاءة

Dissertant

al-Awadi, Muhammad Mahmud

Thesis advisor

Hammad, Mustafa Muhammad

Comitee Members

al-Hasanat, Ahmad Bashir
al-Maani, Mudir Musa
al-Hammuri, Awni Mansur

University

Mutah University

Faculty

Information Technology College

University Country

Jordan

Degree

Master

Degree Date

2015

English Abstract

Arabic language is a complex language that needs special treatment.

However, most previous studies were using statistical methods in Arabic texts classification, and these methods neglect meaning of the terms.

Firstly we built an identical Arabic database, so that they are freely available for research purposes in the Arabic language, then designed a framework for preprocessing Arabic text, which consists of multiple steps and modeling techniques, such as stop word removal and a stemmer to improve the results of Arabic texts categorization.

This thesis focuses on the semantics technique, and proposes a hybrid stemmer for Arabic languages.

Varies techniques are used to implement the Arabic text classifications, and to verify our hybrid stemmer.

These techniques include Latent semantic analysis (LSA) and five machine learning approaches.

LSA used to reduce dimensionality in order to improve the accuracy of categorization systems.

The experiment results showed the effectiveness of our Arabic stemmer in terms of classification accuracy and speed.

The best performance was achieved by combining Singular Value Decomposition (SVD) with cosine similarity measure and Manhattan distance.Finally, we Compared experimentally; Hassanat's distance with Euclidean's distance, Manhattan distance and cosine distance, to choose the best way to calculate the similarity between vectors with five text representation .

Main Subjects

Information Technology and Computer Science

No. of Pages

109

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : The background.

Chapter Three : The proposed Arabic stemmer : Arabic text preprocessing.

Chapter Four : Experiments and results.

References.

American Psychological Association (APA)

al-Awadi, Muhammad Mahmud. (2015). Automatic Arabic text categorization using efficient classification techniques. (Master's theses Theses and Dissertations Master). Mutah University, Jordan
https://search.emarefa.net/detail/BIM-729773

Modern Language Association (MLA)

al-Awadi, Muhammad Mahmud. Automatic Arabic text categorization using efficient classification techniques. (Master's theses Theses and Dissertations Master). Mutah University. (2015).
https://search.emarefa.net/detail/BIM-729773

American Medical Association (AMA)

al-Awadi, Muhammad Mahmud. (2015). Automatic Arabic text categorization using efficient classification techniques. (Master's theses Theses and Dissertations Master). Mutah University, Jordan
https://search.emarefa.net/detail/BIM-729773

Language

English

Data Type

Arab Theses

Record ID

BIM-729773