Clustering with probabilistic topic models on Arabic texts : a comparative study of lda and k-means

Joint Authors

Kelaiaia, Abd al-Salam
Merouani, Hayah

Source

The International Arab Journal of Information Technology

Issue

Vol. 13, Issue 2 (31 Mar. 2016)7 p.

Publisher

Zarqa University

Publication Date

2016-03-31

Country of Publication

Jordan

No. of Pages

7

Main Subjects

Information Technology and Computer Science
Arabic language and Literature

Topics

Abstract EN

probabilistic topic models such as Latent Dirichlet Allocation (LDA) have been widely used applications in many text mining tasks such as retrieval, summarization and clustering on different languages.

In this paper, we present a first comparative study between LDA and K-means, two well-known methods respectively in topics identification and clustering applied on arabic texts.

Our aim is to compare the influence of morpho-syntactic characteristics of Arabic language on performance of first method compared to the second one.

In order to, study different aspects of those methods the study is conducted on four benchmark document collections in which the quality of clustering was measured by the use of four well-known evaluation measures, Rand index, Jaccard index, F-measure and Entropy.

The results consistently show that LDA perform best results more than K-means in most cases.

American Psychological Association (APA)

Kelaiaia, Abd al-Salam& Merouani, Hayah. 2016. Clustering with probabilistic topic models on Arabic texts : a comparative study of lda and k-means. The International Arab Journal of Information Technology،Vol. 13, no. 2.
https://search.emarefa.net/detail/BIM-580942

Modern Language Association (MLA)

Kelaiaia, Abd al-Salam& Merouani, Hayah. Clustering with probabilistic topic models on Arabic texts : a comparative study of lda and k-means. The International Arab Journal of Information Technology Vol. 13, no. 2 (Mar. 2016).
https://search.emarefa.net/detail/BIM-580942

American Medical Association (AMA)

Kelaiaia, Abd al-Salam& Merouani, Hayah. Clustering with probabilistic topic models on Arabic texts : a comparative study of lda and k-means. The International Arab Journal of Information Technology. 2016. Vol. 13, no. 2.
https://search.emarefa.net/detail/BIM-580942

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-580942