Efficiency improvement of data-mining with two-dimensional data reduction

Dissertant

al-Udat, Ahmad Musa M.

Thesis advisor

Abu Suud, Salih

University

Arab Academy for Financial and Banking Sciences

Faculty

The Faculty of Information Systems and Technology

University Country

Jordan

Degree

Ph.D.

Degree Date

2007

English Abstract

Discretization is a technique that transforming continuous variables into discrete variables for inductive machine learning and data mining systems.

In machine learning and data mining research, inductive learning systems are widely used to acquire classification knowledge from a set of given samples.

Most classification algorithms in machine learning and data mining can only be applied to nominal data.

They cannot effectively deal with continuous attributes directly.

In practice, however, a large portion of real world data sets contains continuous data and / or data of mixed types continuous, discrete, ordinal, and nominal.

These kinds of data, the continuous variables need to be discretized.

Systems that are designed for continuous attributes can attain a higher accuracy when the data are given with appropriate discrete values.

This result is improving the performance of the inductive learning process.

Hence, the limitation of most inductive learning algorithms can be conquered by discretizing the continuous attributes appropriately before feeding the data into the learning systems.

Discretization methods have been used with the rapid development of computer technology ; machine learning systems are challenged to extract knowledge from huge databases repositories.

This thesis present a new discretization algorithm called ClassifyChi2.

ClassifyChi2 algorithm dependant on chi-square statistical methods and the algorithm is supervised, static, global, and belongs to the merging technique.

Three algorithms (ClassifyChi2, Chi Merge, and Chi2) are implementing for comparison purposes, and three criteria (starting number of intervals, number of cut points with accuracy, and execution time) are using to measure the performance of algorithms, the running algorithms are done over two famous data sets, the Iris and the Waveform.

Through implementing the three algorithms over the three measures, we will see that the best results are always coming from ClassifyChi2 algorithm.

ClassifyChi2 algorithm achieved to the minimum of starting intervals, minimum of intervals number (cut-off point) with a good accuracy, and minimum time for execution time measure.

Main Subjects

Information Technology and Computer Science

Topics

No. of Pages

94

Table of Contents

Table of contents.

Abstract.

Chapter one : Introduction.

Chapter two : Machine learning.

Chapter three : Feature-discretization technique.

Chapter four : Related works.

Chapter five : Classifychi2 discretization algorithm.

Chapter six : The experimental results.

Chapter seven : Conclusion and future works.

References.

American Psychological Association (APA)

al-Udat, Ahmad Musa M.. (2007). Efficiency improvement of data-mining with two-dimensional data reduction. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-304926

Modern Language Association (MLA)

al-Udat, Ahmad Musa M.. Efficiency improvement of data-mining with two-dimensional data reduction. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences. (2007).
https://search.emarefa.net/detail/BIM-304926

American Medical Association (AMA)

al-Udat, Ahmad Musa M.. (2007). Efficiency improvement of data-mining with two-dimensional data reduction. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-304926

Language

English

Data Type

Arab Theses

Record ID

BIM-304926