Semi-supervised learning using clustering and associative classification mining

مقدم أطروحة جامعية

Abd Allah, Nada Sulayman Hasan

مشرف أطروحة جامعية

al-Khalidi, Jihad O.
Thabtah, Fadi Abd al-Jabir

أعضاء اللجنة

al-Rababiah, Mamun S.
Ababinah, Ismail M.
Hamidi, Ismail I.

الجامعة

جامعة آل البيت

الكلية

كلية الأمير الحسين بن عبد الله لتكنولوجيا المعلومات

القسم الأكاديمي

قسم علوم الحاسوب

دولة الجامعة

الأردن

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2011

الملخص الإنجليزي

The amount of web information available in the internet is increasing in a fast way in the last years, what makes the organization of this information more and more complicated.

For this reason it's important to find a tool that facilitates finding the useful information for the user and organize it.

Often, the search for a document can be performed with keywords or by browsing through a catalogue (such as MSN!) where documents are organized into categories.

Text categorization has become one of the key techniques for handling and organizing text data.

The goal of text categorization is to classify documents into a certain number of pre-defined categories.

hi recent years, a classification technique which combines the advantages of association rule mining and classification tasks, called classification using association (CUA) was proposed which explores all associations between attribute values and their classes in the training data set, aiming to construct high quality classifiers.

CUA goal is to produce useful knowledge usually missed by traditional methods, which therefore should improve the predictive accuracy within applications.

In the previous research works, a large number of labeled training documents have used supervised learning.

This task is usually expensive, time consuming, and problematic.

Thus, it is easier to build labeled data sets from unlabeled documents as some researchers suggested using semi-supervised such as K-mean or Expectation Maximization with classification data mining.

This approach of utilizing unlabeled documents to produce classification models is called semi-supervised learning where the input data contain both labeled and unlabelled examples and scholars use the unlabelled training data set to perform supervised learning such as classification data mining.

In other words, unsupervised tasks like clustering can be employed as an initial step before supervised classification kicks in.

According to several researchers the semi-supervised learning approach usually improves the performance of classification algorithms since data are assigned categories in an automated manner rather than human assignment which requires care and experience.

The proposed approach helps to integrate large amounts of unlabeled data in the supervised learning process so large amount of documents can be used in a supervised learning approach.

The results in experimentation stage are generated through employing an intelligence tool which called WEKA.

The proposed approach consist of two main steps: The clustering step (first step) is carried out in this research using EM algorithm because of its success in previous researches.

, EM is used to cluster both the labeled and unlabeled data within the training data set so that the labeled documents within each cluster determine the unknown labels for the unlabeled documents.

This should simplify the training process during the classification step since all documents are now associated with labels.

The second step (classification) utilizes the resulting clusters produced in the previous step as features for the associative classification algorithm.

Specifically, a high accurate associative algorithm called Multi-class Classification based on Association Rule has been used to learn the classifiers.

The results in this thesis are generated from two different data repositories, i.e.

UCI and Reuters, and using different evaluation measures.

After the experimentations, the results show that MCAR algorithm outperforms the other rule learning techniques on the majority of the data sets and achieved less error-rate than the others.

Also the proposed approach outperforms some popular classification algorithm and AC algorithms such as C4.5 and ARC-BC and derived similar performance with reference to average precision and recall with other AC algorithms.

Finally, the results proves that the use of MCAR algorithm improve the result of the proposed approach especially that MCAR achieves good result when compared it with the other classification algorithm.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

97

قائمة المحتويات

Table of contents.

Abstract.

Chapter One : introduction.

Chapter Two : theoretical background.

Chapter Three : proposed approach.

Chapter Four : conclusions and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Abd Allah, Nada Sulayman Hasan. (2011). Semi-supervised learning using clustering and associative classification mining. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321336

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Abd Allah, Nada Sulayman Hasan. Semi-supervised learning using clustering and associative classification mining. (Master's theses Theses and Dissertations Master). Al albayt University. (2011).
https://search.emarefa.net/detail/BIM-321336

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Abd Allah, Nada Sulayman Hasan. (2011). Semi-supervised learning using clustering and associative classification mining. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321336

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-321336