Investigating the impacts of semantic features on Arabic text classification

العناوين الأخرى

بحث تأثير الخصائص الدلالية على تصنيف النصوص العربية

مقدم أطروحة جامعية

Ahmad, Inas Sidqi al-Hajj

مشرف أطروحة جامعية

Ujan, Arafat

أعضاء اللجنة

Hammu, Bassam
Dawud, Dawud
al-Kuz, Akram

الجامعة

جامعة الأميرة سمية للتكنولوجيا

الكلية

كلية الملك الحسين لعلوم الحوسبة

القسم الأكاديمي

قسم علم الحاسوب

دولة الجامعة

الأردن

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2016

الملخص الإنجليزي

Text Classification (TC) is the process of classifying documents into a predefined categories based on the document content.

Reducing texts dimensionality affects the performance of classification.

Most of dimensionality reduction techniques ignore the semantic content of texts and focus only on the amount of reduction.

This research investigates the impact of some reduction techniques on Arabic text classification and proposed a new method to tackle this problem by taking into account the sematic relationships that might exist among words and terms such as, Arabic Name Entity (ANE) and synonyms.

In Addition, the proposed method takes into account the linguistic features of the Arabic language.

This method is based on replacing all the ANEs that appears in the text with their reference according the linguistic resource then applying feature clustering (stem synonym grouping method and root grouping method) to merge the similar and related stems without ignoring sematic relationships by building a Semantic Vector Space Model (SVSM).

An in-house collected dataset which contains 332 documents which belong to four different categories: Economy, Politics, Health, and Technology.

The dataset is split into two parts: 600KB (62% of the files of the dataset) for training the system where 150KB for each category and the rest 38% of the dataset files is considered for the testing purpose.

Dimension reduction ratio (DRR) is used to measure the reduction rate.

Precision, recall, f-measure are used to estimate the classification results.

The experiment results conclude that the proposed method not only improve the accuracy of the classification using support vector machine (SVM) classifier but also reduce the feature amount of the text about 3%-5%.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

عدد الصفحات

116

قائمة المحتويات

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Literature.

Chapter Three : Background.

Chapter Four : Proposed methodology.

Chapter Five : Experimental results and analysis.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Ahmad, Inas Sidqi al-Hajj. (2016). Investigating the impacts of semantic features on Arabic text classification. (Master's theses Theses and Dissertations Master). Princess Sumaya University for Technology, Jordan
https://search.emarefa.net/detail/BIM-693669

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Ahmad, Inas Sidqi al-Hajj. Investigating the impacts of semantic features on Arabic text classification. (Master's theses Theses and Dissertations Master). Princess Sumaya University for Technology. (2016).
https://search.emarefa.net/detail/BIM-693669

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Ahmad, Inas Sidqi al-Hajj. (2016). Investigating the impacts of semantic features on Arabic text classification. (Master's theses Theses and Dissertations Master). Princess Sumaya University for Technology, Jordan
https://search.emarefa.net/detail/BIM-693669

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-693669