Clustering Arabic documents using frequent itemset-based hierarchical clustering with an n-grams

Dissertant

al-Sarayirah, Haytham Salim

Thesis advisor

al-Shalabi, Riyad

Comitee Members

al-Umari, Mahmud Ahmad
al-Hammuri, Awni Mansur

University

Arab Academy for Financial and Banking Sciences

Faculty

The Faculty of Information Systems and Technology

Department

Computer information systems

University Country

Jordan

Degree

Ph.D.

Degree Date

2008

English Abstract

The availability of large amount of information in an electronic format from different sources in different formats and the need of organizations to benefit from these information encourage researchers to develop applications to handle these information, Clustering plays an important role in providing intuitive navigation and browsing techniques by organizing large collection of documents into a small number of meaningful groups.

In this research we used one of the powerful clustering algorithm "Frequent Item set-based Hierarchical Clustering (FICH)" to cluster Arabic documents based on Frequent item sets, with an grams technique to be used as a cluster label.

Since Arabic is used by more than 265 millions of Arabs, also it is understood by more than one billion of Muslims worldwide, as the Muslims' holy book (the Koran) is written in Arabic, and Arabic documents became very popular on an electronic format, so the need for clustering documents became very necessary.

We conducted our experiments on 600 Arabic documents using grams based on word level, Trigrams and Quad grams and we got a promising results.

Main Subjects

Information Technology and Computer Science

Topics

No. of Pages

92

Table of Contents

Table of contents.

Abstract.

Chapter One : introduction.

Chapter Two : literature review.

Chapter Three : Arabic language structure.

Chapter Four : information retrieval and term weighting techniques.

Chapter Five : Frequent Itemset-based hierarchical clustering algorithm.

Chapter Six : research methodology.

Chapter Seven : experimental evaluation.

Chapter Eight : conclusion.

References.

American Psychological Association (APA)

al-Sarayirah, Haytham Salim. (2008). Clustering Arabic documents using frequent itemset-based hierarchical clustering with an n-grams. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-306260

Modern Language Association (MLA)

al-Sarayirah, Haytham Salim. Clustering Arabic documents using frequent itemset-based hierarchical clustering with an n-grams. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences. (2008).
https://search.emarefa.net/detail/BIM-306260

American Medical Association (AMA)

al-Sarayirah, Haytham Salim. (2008). Clustering Arabic documents using frequent itemset-based hierarchical clustering with an n-grams. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-306260

Language

English

Data Type

Arab Theses

Record ID

BIM-306260