An automated Arabic text categorization based on the frequency ratio accumulation

المؤلفون المشاركون

Sharif, Bara
Umar, Nazliya
Sharif, Ziyad

المصدر

The International Arab Journal of Information Technology

العدد

المجلد 11، العدد 2 (31 مارس/آذار 2014)10ص.

الناشر

جامعة الزرقاء

تاريخ النشر

2014-03-31

دولة النشر

الأردن

عدد الصفحات

10

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

الملخص EN

Compared to other languages, there is still a limited body of research which has been conducted for the automated Arabic text categorization (TC) due to the complex and rich nature of the Arabic language.

Most of such research includes supervised machine learning approaches such as Naïve Bayes, K-Nearest Neighbor, Support Vector Machine and Decision Tree.

Most of these techniques have complex mathematical models and do not usually lead to accurate results for Arabic TC.

Moreover, all the previous research tended to deal with the feature selection and the classification respectively as independent problems in automatic TC which led to the cost and complex computational issues.

Based on this, the need to apply new techniques suitable for Arabic language and its complex morphology arises.

A new approach in the Arabic TC term called the Frequency Ratio Accumulation Method (FRAM) which has a simple mathematical model is applied in this study.

The categorization task is combined with a feature processing task.

The current research mainly aims at solving the problem of automatic Arabic TC by investigating the Frequency Ratio Accumulation Method in order to enhance the performance of Arabic TC model.

The performance of FRAM classifier is compared with three classifiers based on Bayesian theorem which are called Simple Naïve Bayes, Multi-variant Bernoulli Naïve Bayes and Multinomial Naïve Bayes models.

Based on the findings of the study, the FRAM has outperformed the state-of-the-arts.

It’s achieved 95.1% macro-F1 value by using unigram word-level representation method.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Sharif, Bara& Umar, Nazliya& Sharif, Ziyad. 2014. An automated Arabic text categorization based on the frequency ratio accumulation. The International Arab Journal of Information Technology،Vol. 11, no. 2.
https://search.emarefa.net/detail/BIM-334272

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Sharif, Bara…[et al.]. An automated Arabic text categorization based on the frequency ratio accumulation. The International Arab Journal of Information Technology Vol. 11, no. 2 (Mar. 2014).
https://search.emarefa.net/detail/BIM-334272

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Sharif, Bara& Umar, Nazliya& Sharif, Ziyad. An automated Arabic text categorization based on the frequency ratio accumulation. The International Arab Journal of Information Technology. 2014. Vol. 11, no. 2.
https://search.emarefa.net/detail/BIM-334272

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references

رقم السجل

BIM-334272