Clustering with probabilistic topic models on Arabic texts : a comparative study of lda and k-means
Joint Authors
Kelaiaia, Abd al-Salam
Merouani, Hayah
Source
The International Arab Journal of Information Technology
Issue
Vol. 13, Issue 2 (31 Mar. 2016)7 p.
Publisher
Publication Date
2016-03-31
Country of Publication
Jordan
No. of Pages
7
Main Subjects
Information Technology and Computer Science
Arabic language and Literature
Topics
Abstract EN
probabilistic topic models such as Latent Dirichlet Allocation (LDA) have been widely used applications in many text mining tasks such as retrieval, summarization and clustering on different languages.
In this paper, we present a first comparative study between LDA and K-means, two well-known methods respectively in topics identification and clustering applied on arabic texts.
Our aim is to compare the influence of morpho-syntactic characteristics of Arabic language on performance of first method compared to the second one.
In order to, study different aspects of those methods the study is conducted on four benchmark document collections in which the quality of clustering was measured by the use of four well-known evaluation measures, Rand index, Jaccard index, F-measure and Entropy.
The results consistently show that LDA perform best results more than K-means in most cases.
American Psychological Association (APA)
Kelaiaia, Abd al-Salam& Merouani, Hayah. 2016. Clustering with probabilistic topic models on Arabic texts : a comparative study of lda and k-means. The International Arab Journal of Information Technology،Vol. 13, no. 2.
https://search.emarefa.net/detail/BIM-580942
Modern Language Association (MLA)
Kelaiaia, Abd al-Salam& Merouani, Hayah. Clustering with probabilistic topic models on Arabic texts : a comparative study of lda and k-means. The International Arab Journal of Information Technology Vol. 13, no. 2 (Mar. 2016).
https://search.emarefa.net/detail/BIM-580942
American Medical Association (AMA)
Kelaiaia, Abd al-Salam& Merouani, Hayah. Clustering with probabilistic topic models on Arabic texts : a comparative study of lda and k-means. The International Arab Journal of Information Technology. 2016. Vol. 13, no. 2.
https://search.emarefa.net/detail/BIM-580942
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references
Record ID
BIM-580942