Clustering Arabic documents using frequent itemset-based hierarchical clustering with an n-grams
Dissertant
Thesis advisor
Comitee Members
al-Umari, Mahmud Ahmad
al-Hammuri, Awni Mansur
University
Arab Academy for Financial and Banking Sciences
Faculty
The Faculty of Information Systems and Technology
Department
Computer information systems
University Country
Jordan
Degree
Ph.D.
Degree Date
2008
English Abstract
The availability of large amount of information in an electronic format from different sources in different formats and the need of organizations to benefit from these information encourage researchers to develop applications to handle these information, Clustering plays an important role in providing intuitive navigation and browsing techniques by organizing large collection of documents into a small number of meaningful groups.
In this research we used one of the powerful clustering algorithm "Frequent Item set-based Hierarchical Clustering (FICH)" to cluster Arabic documents based on Frequent item sets, with an grams technique to be used as a cluster label.
Since Arabic is used by more than 265 millions of Arabs, also it is understood by more than one billion of Muslims worldwide, as the Muslims' holy book (the Koran) is written in Arabic, and Arabic documents became very popular on an electronic format, so the need for clustering documents became very necessary.
We conducted our experiments on 600 Arabic documents using grams based on word level, Trigrams and Quad grams and we got a promising results.
Main Subjects
Information Technology and Computer Science
Topics
No. of Pages
92
Table of Contents
Table of contents.
Abstract.
Chapter One : introduction.
Chapter Two : literature review.
Chapter Three : Arabic language structure.
Chapter Four : information retrieval and term weighting techniques.
Chapter Five : Frequent Itemset-based hierarchical clustering algorithm.
Chapter Six : research methodology.
Chapter Seven : experimental evaluation.
Chapter Eight : conclusion.
References.
American Psychological Association (APA)
al-Sarayirah, Haytham Salim. (2008). Clustering Arabic documents using frequent itemset-based hierarchical clustering with an n-grams. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-306260
Modern Language Association (MLA)
al-Sarayirah, Haytham Salim. Clustering Arabic documents using frequent itemset-based hierarchical clustering with an n-grams. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences. (2008).
https://search.emarefa.net/detail/BIM-306260
American Medical Association (AMA)
al-Sarayirah, Haytham Salim. (2008). Clustering Arabic documents using frequent itemset-based hierarchical clustering with an n-grams. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-306260
Language
English
Data Type
Arab Theses
Record ID
BIM-306260