Clustering with probabilistic topic models on Arabic texts : a comparative study of lda and k-means

المؤلفون المشاركون

Kelaiaia, Abd al-Salam
Merouani, Hayah

المصدر

The International Arab Journal of Information Technology

العدد

المجلد 13، العدد 2 (31 مارس/آذار 2016)7ص.

الناشر

جامعة الزرقاء

تاريخ النشر

2016-03-31

دولة النشر

الأردن

عدد الصفحات

7

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب
اللغة العربية وآدابها

الموضوعات

الملخص EN

probabilistic topic models such as Latent Dirichlet Allocation (LDA) have been widely used applications in many text mining tasks such as retrieval, summarization and clustering on different languages.

In this paper, we present a first comparative study between LDA and K-means, two well-known methods respectively in topics identification and clustering applied on arabic texts.

Our aim is to compare the influence of morpho-syntactic characteristics of Arabic language on performance of first method compared to the second one.

In order to, study different aspects of those methods the study is conducted on four benchmark document collections in which the quality of clustering was measured by the use of four well-known evaluation measures, Rand index, Jaccard index, F-measure and Entropy.

The results consistently show that LDA perform best results more than K-means in most cases.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Kelaiaia, Abd al-Salam& Merouani, Hayah. 2016. Clustering with probabilistic topic models on Arabic texts : a comparative study of lda and k-means. The International Arab Journal of Information Technology،Vol. 13, no. 2.
https://search.emarefa.net/detail/BIM-580942

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Kelaiaia, Abd al-Salam& Merouani, Hayah. Clustering with probabilistic topic models on Arabic texts : a comparative study of lda and k-means. The International Arab Journal of Information Technology Vol. 13, no. 2 (Mar. 2016).
https://search.emarefa.net/detail/BIM-580942

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Kelaiaia, Abd al-Salam& Merouani, Hayah. Clustering with probabilistic topic models on Arabic texts : a comparative study of lda and k-means. The International Arab Journal of Information Technology. 2016. Vol. 13, no. 2.
https://search.emarefa.net/detail/BIM-580942

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references

رقم السجل

BIM-580942