Clustering Arabic documents using frequent itemset-based hierarchical clustering with an n-grams

مقدم أطروحة جامعية

al-Sarayirah, Haytham Salim

مشرف أطروحة جامعية

al-Shalabi, Riyad

أعضاء اللجنة

al-Umari, Mahmud Ahmad
al-Hammuri, Awni Mansur

الجامعة

الأكاديمية العربية للعلوم المالية و المصرفية

الكلية

كلية نظم و تكنولوجيا المعلومات

القسم الأكاديمي

قسم نظم المعلومات الحاسوبية

دولة الجامعة

الأردن

الدرجة العلمية

دكتوراه

تاريخ الدرجة العلمية

2008

الملخص الإنجليزي

The availability of large amount of information in an electronic format from different sources in different formats and the need of organizations to benefit from these information encourage researchers to develop applications to handle these information, Clustering plays an important role in providing intuitive navigation and browsing techniques by organizing large collection of documents into a small number of meaningful groups.

In this research we used one of the powerful clustering algorithm "Frequent Item set-based Hierarchical Clustering (FICH)" to cluster Arabic documents based on Frequent item sets, with an grams technique to be used as a cluster label.

Since Arabic is used by more than 265 millions of Arabs, also it is understood by more than one billion of Muslims worldwide, as the Muslims' holy book (the Koran) is written in Arabic, and Arabic documents became very popular on an electronic format, so the need for clustering documents became very necessary.

We conducted our experiments on 600 Arabic documents using grams based on word level, Trigrams and Quad grams and we got a promising results.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

92

قائمة المحتويات

Table of contents.

Abstract.

Chapter One : introduction.

Chapter Two : literature review.

Chapter Three : Arabic language structure.

Chapter Four : information retrieval and term weighting techniques.

Chapter Five : Frequent Itemset-based hierarchical clustering algorithm.

Chapter Six : research methodology.

Chapter Seven : experimental evaluation.

Chapter Eight : conclusion.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Sarayirah, Haytham Salim. (2008). Clustering Arabic documents using frequent itemset-based hierarchical clustering with an n-grams. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-306260

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Sarayirah, Haytham Salim. Clustering Arabic documents using frequent itemset-based hierarchical clustering with an n-grams. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences. (2008).
https://search.emarefa.net/detail/BIM-306260

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Sarayirah, Haytham Salim. (2008). Clustering Arabic documents using frequent itemset-based hierarchical clustering with an n-grams. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-306260

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-306260