Investigating the impacts of semantic features on Arabic text classification

Other Title(s)

بحث تأثير الخصائص الدلالية على تصنيف النصوص العربية

Dissertant

Ahmad, Inas Sidqi al-Hajj

Thesis advisor

Ujan, Arafat

Comitee Members

Hammu, Bassam
Dawud, Dawud
al-Kuz, Akram

University

Princess Sumaya University for Technology

Faculty

King Hussein Faculty for Computing Sciences

Department

Department of Computer Sciences

University Country

Jordan

Degree

Master

Degree Date

2016

English Abstract

Text Classification (TC) is the process of classifying documents into a predefined categories based on the document content.

Reducing texts dimensionality affects the performance of classification.

Most of dimensionality reduction techniques ignore the semantic content of texts and focus only on the amount of reduction.

This research investigates the impact of some reduction techniques on Arabic text classification and proposed a new method to tackle this problem by taking into account the sematic relationships that might exist among words and terms such as, Arabic Name Entity (ANE) and synonyms.

In Addition, the proposed method takes into account the linguistic features of the Arabic language.

This method is based on replacing all the ANEs that appears in the text with their reference according the linguistic resource then applying feature clustering (stem synonym grouping method and root grouping method) to merge the similar and related stems without ignoring sematic relationships by building a Semantic Vector Space Model (SVSM).

An in-house collected dataset which contains 332 documents which belong to four different categories: Economy, Politics, Health, and Technology.

The dataset is split into two parts: 600KB (62% of the files of the dataset) for training the system where 150KB for each category and the rest 38% of the dataset files is considered for the testing purpose.

Dimension reduction ratio (DRR) is used to measure the reduction rate.

Precision, recall, f-measure are used to estimate the classification results.

The experiment results conclude that the proposed method not only improve the accuracy of the classification using support vector machine (SVM) classifier but also reduce the feature amount of the text about 3%-5%.

Main Subjects

Information Technology and Computer Science

No. of Pages

116

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Literature.

Chapter Three : Background.

Chapter Four : Proposed methodology.

Chapter Five : Experimental results and analysis.

References.

American Psychological Association (APA)

Ahmad, Inas Sidqi al-Hajj. (2016). Investigating the impacts of semantic features on Arabic text classification. (Master's theses Theses and Dissertations Master). Princess Sumaya University for Technology, Jordan
https://search.emarefa.net/detail/BIM-693669

Modern Language Association (MLA)

Ahmad, Inas Sidqi al-Hajj. Investigating the impacts of semantic features on Arabic text classification. (Master's theses Theses and Dissertations Master). Princess Sumaya University for Technology. (2016).
https://search.emarefa.net/detail/BIM-693669

American Medical Association (AMA)

Ahmad, Inas Sidqi al-Hajj. (2016). Investigating the impacts of semantic features on Arabic text classification. (Master's theses Theses and Dissertations Master). Princess Sumaya University for Technology, Jordan
https://search.emarefa.net/detail/BIM-693669

Language

English

Data Type

Arab Theses

Record ID

BIM-693669