Concepts seeds gathering and dataset updating algorithm for handling concept drift

Other Title(s)

خوارزمية جمع بذور المفاهيم و تحديث مجموعة البيانات لمعالجة تغير المفهوم

Dissertant

al-Buhissi, Ibrahim

Thesis advisor

Hiwahi, Nabil M.

University

Islamic University

Faculty

Faculty of Information Technology

University Country

Palestine (Gaza Strip)

Degree

Master

Degree Date

2015

English Abstract

Our life does not stop evolving and changing and our systems should be adapted to such behavior.

The data mining is considered important and vital tool that helps us to get valuable information from hidden patterns and data.

The main task in data mining is learning models.

The traditional way for learning is called batch learning, which assumes that all training examples are available at the time of learning.

In data mining, the phenomenon of change in data distribution over time is known as concept drift.

The traditional classification models do not handle this change.

In this research, we introduce a new approach called Concepts Seeds Gathering and Dataset Updating algorithm (CSG-DU) that gives the traditional classification models the ability to adapt and cope with concept drift as time passes.

CSG-DU is concerned with discovering new concepts in data stream and its main target is to increase the classification accuracy using any classification model when changes occur in the underlying concepts.

Handling concept drift is done by selecting the data instances that represent the new concepts and inject them into the training dataset.

Our proposed approach has been tested using synthetic and real datasets that represent different types of concept drift (sudden, gradual and incremental).

The experiments conducted show that after applying our approach, the classification accuracy increased from low values to high and acceptable ones.

Finally, a comparison study between CSG-DU and Set Formation for Delayed Labeling algorithm (SFDL) has been conducted; SFDL is an approach that handles sudden and gradual concept drift.

Results indicate that our proposed approach outperforms SFDL in terms of classification accuracy

Main Subjects

Information Technology and Computer Science

Topics

No. of Pages

58

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Related work.

Chapter Three : Methodology and the proposed model.

Chapter Four : Experimental results and evaluation.

Chapter Five : Conclusion and future work.

References.

American Psychological Association (APA)

al-Buhissi, Ibrahim. (2015). Concepts seeds gathering and dataset updating algorithm for handling concept drift. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-611204

Modern Language Association (MLA)

al-Buhissi, Ibrahim. Concepts seeds gathering and dataset updating algorithm for handling concept drift. (Master's theses Theses and Dissertations Master). Islamic University. (2015).
https://search.emarefa.net/detail/BIM-611204

American Medical Association (AMA)

al-Buhissi, Ibrahim. (2015). Concepts seeds gathering and dataset updating algorithm for handling concept drift. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-611204

Language

English

Data Type

Arab Theses

Record ID

BIM-611204