An efficient associative classification algorithm for text categorization

Other Title(s)

خوارزمية فعالة معتمدة على التصنيف الترابطي لتصنيف النصوص

Dissertant

Abu Rumman, Bashshar Sulayman Abd Allah

Thesis advisor

Thabtah, Fadi Fayiz Abd al-Jabir

Comitee Members

Ahmad, Mamun Khalid
Aqil, Misbah M.

University

Middle East University

Faculty

Faculty of Information Technology

Department

Department of Computer Information Systems

University Country

Jordan

Degree

Master

Degree Date

2012

English Abstract

Text categorization (TC) is an interesting research area which attracted several researchers because of the large quantities of textual documents online and offline.

TC concerns about the automatic classification of textual data to one or multiple classes based on their content keywords.

Many different classification approaches were developed to categorize textual data, these approaches can be evaluated mainly by accuracy and the knowledge they produce.

Moreover, these classification approaches range from high accuracy methods such as neural network to low ones such as Naïve bayes which means that some of them produce high accurate classifiers and others low accurate ones.

However, one fundamental measurement criteria is the understandability of the end-user of the resulting classifiers.

Some classification approaches like neural network outputted accurate classifiers yet difficult to understand ones.

A recently a new classification data mining technique called Associative Classification(AC) is developed which combines high accuracy and understandability output together based on association rule.

AC is a high efficient method that builds more predictive and accurate classification systems than traditional classification methods such as probabilistic and The k-nearest neighbor algorithm according to many research experimental studies.

AC produces rule's based classifiers on the form (if  then ) that are easy to understand and manipulate by end-user.

This research is devoted to develop a new model based on AC for text categorization problem.

Mainly we focus on three main steps in the TC problem and these are: (1) Developing an efficient and fast intersection method based DiffSet of the Eclat method of association rule and adopting it to unstructured classification data.

(2) Proposing a novel rule filtering procedure that reduces the number of rules in the outputted classifiers by considering partially matching during building the classifier.

This novel method significantly minimized the number of rules described by the proposed model when compared with current AC models like MCAR.

Lastly, (3) we improved the accuracy of the outputted classifiers by considering multiple rules in the classification step rather than most current AC algorithms that use only one rule to assign the test case a class value.

Experimental results on real world textual data set called the Reuter indicated that the proposed model outperforms text categorization techniques either traditional techniques or (AC) techniques .

Main Subjects

Information Technology and Computer Science

No. of Pages

79

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Two literature review and related work.

Chapter Three : Methodology.

Chapter Four : Result analysis.

Chapter Five : Conclusion and future work.

References.

American Psychological Association (APA)

Abu Rumman, Bashshar Sulayman Abd Allah. (2012). An efficient associative classification algorithm for text categorization. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-694310

Modern Language Association (MLA)

Abu Rumman, Bashshar Sulayman Abd Allah. An efficient associative classification algorithm for text categorization. (Master's theses Theses and Dissertations Master). Middle East University. (2012).
https://search.emarefa.net/detail/BIM-694310

American Medical Association (AMA)

Abu Rumman, Bashshar Sulayman Abd Allah. (2012). An efficient associative classification algorithm for text categorization. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-694310

Language

English

Data Type

Arab Theses

Record ID

BIM-694310