Efficiency improvement of data-mining with two-dimensional data reduction

مقدم أطروحة جامعية

al-Udat, Ahmad Musa M.

مشرف أطروحة جامعية

Abu Suud, Salih

الجامعة

الأكاديمية العربية للعلوم المالية و المصرفية

الكلية

كلية نظم و تكنولوجيا المعلومات

دولة الجامعة

الأردن

الدرجة العلمية

دكتوراه

تاريخ الدرجة العلمية

2007

الملخص الإنجليزي

Discretization is a technique that transforming continuous variables into discrete variables for inductive machine learning and data mining systems.

In machine learning and data mining research, inductive learning systems are widely used to acquire classification knowledge from a set of given samples.

Most classification algorithms in machine learning and data mining can only be applied to nominal data.

They cannot effectively deal with continuous attributes directly.

In practice, however, a large portion of real world data sets contains continuous data and / or data of mixed types continuous, discrete, ordinal, and nominal.

These kinds of data, the continuous variables need to be discretized.

Systems that are designed for continuous attributes can attain a higher accuracy when the data are given with appropriate discrete values.

This result is improving the performance of the inductive learning process.

Hence, the limitation of most inductive learning algorithms can be conquered by discretizing the continuous attributes appropriately before feeding the data into the learning systems.

Discretization methods have been used with the rapid development of computer technology ; machine learning systems are challenged to extract knowledge from huge databases repositories.

This thesis present a new discretization algorithm called ClassifyChi2.

ClassifyChi2 algorithm dependant on chi-square statistical methods and the algorithm is supervised, static, global, and belongs to the merging technique.

Three algorithms (ClassifyChi2, Chi Merge, and Chi2) are implementing for comparison purposes, and three criteria (starting number of intervals, number of cut points with accuracy, and execution time) are using to measure the performance of algorithms, the running algorithms are done over two famous data sets, the Iris and the Waveform.

Through implementing the three algorithms over the three measures, we will see that the best results are always coming from ClassifyChi2 algorithm.

ClassifyChi2 algorithm achieved to the minimum of starting intervals, minimum of intervals number (cut-off point) with a good accuracy, and minimum time for execution time measure.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

94

قائمة المحتويات

Table of contents.

Abstract.

Chapter one : Introduction.

Chapter two : Machine learning.

Chapter three : Feature-discretization technique.

Chapter four : Related works.

Chapter five : Classifychi2 discretization algorithm.

Chapter six : The experimental results.

Chapter seven : Conclusion and future works.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Udat, Ahmad Musa M.. (2007). Efficiency improvement of data-mining with two-dimensional data reduction. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-304926

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Udat, Ahmad Musa M.. Efficiency improvement of data-mining with two-dimensional data reduction. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences. (2007).
https://search.emarefa.net/detail/BIM-304926

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Udat, Ahmad Musa M.. (2007). Efficiency improvement of data-mining with two-dimensional data reduction. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-304926

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-304926