Machine learning and feature selection approaches for categorizing Arabic text : analysis, comparison, and proposal

العناوين الأخرى

طرق تعلم الآلة واستخلاص الصفات لتصنيف النصوص العربية : تحليل و دراسة مقارنة و إقتراح

المؤلفون المشاركون

al-Fishawi, Nawal
al-Nahhas, Ayat
Tulbah, Maha
Muhamma, Nur

المصدر

The Egyptian Journal of Language Engineering

العدد

المجلد 7، العدد 2 (30 سبتمبر/أيلول 2020)، ص ص. 1-19، 19ص.

الناشر

الجمعية المصرية لهندسة اللغة

تاريخ النشر

2020-09-30

دولة النشر

مصر

عدد الصفحات

19

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

الملخص EN

This work adopts some classification approaches for categorizing Arabic text.

The approaches are operated on two datasets as test-beds.

A comparative study is done to evaluate the performance of the adopted classifiers.

Some feature selection methods are also analyzed, investigated, and evaluated.

Selecting the most significant features is important because the huge number of features may cause performance degradation for text classification.

A comparative study is done among the adopted feature selection methods for classifying Arabic documents.


Moreover, a modification is done on the feature selection approaches by doing amalgamation for the chosen methods.

A novel method is also proposed for selecting the most appropriate features.

The method is based on the semantic fusion and multiple-words (SF-MW) for constructing the features.

A comparison is done among the adopted feature selection methods and the proposed one.


The experimental results show that the best performance was for the SVM classifier compared to the KNN and NB classifiers.

The combination among the adopted feature selection methods presents better results compared to the individual adopted ones.

The proposed feature selection method (SF-MW) is promising as it reduced the features and achieved higher classification accuracy.

The accuracy improvement was about 22% for the two chosen Arabic test-beds which contain 1246 and 1500 documents respectively.

The proposed method is expected to be also efficient for other Arabic and English datasets.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Nahhas, Ayat& Muhamma, Nur& al-Fishawi, Nawal& Tulbah, Maha. 2020. Machine learning and feature selection approaches for categorizing Arabic text : analysis, comparison, and proposal. The Egyptian Journal of Language Engineering،Vol. 7, no. 2, pp.1-19.
https://search.emarefa.net/detail/BIM-1012024

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Nahhas, Ayat…[et al.]. Machine learning and feature selection approaches for categorizing Arabic text : analysis, comparison, and proposal. The Egyptian Journal of Language Engineering Vol. 7, no. 2 (Sep. 2020), pp.1-19.
https://search.emarefa.net/detail/BIM-1012024

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Nahhas, Ayat& Muhamma, Nur& al-Fishawi, Nawal& Tulbah, Maha. Machine learning and feature selection approaches for categorizing Arabic text : analysis, comparison, and proposal. The Egyptian Journal of Language Engineering. 2020. Vol. 7, no. 2, pp.1-19.
https://search.emarefa.net/detail/BIM-1012024

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

-

رقم السجل

BIM-1012024