Using clustering and association rules techniques to compress data sets

Other Title(s)

استخدام تقنيات التجميع و قواعد الربط لضغط مجموعات البيانات

Dissertant

Ali, Rasha Subhi

Thesis advisor

Duaymi, Mahdi Kazaz.
al-Ubaydi, Ahmad Tariq Sadiq

Comitee Members

al-Zubaydi, Diya A.
al-Alusi, Nida Fulayyih Hasan
Khalid, Lamya Hafiz

University

University of Baghdad

Faculty

College of Science

Department

Department of Computer Science

University Country

Iraq

Degree

Master

Degree Date

2013

English Abstract

There are many benefits from using data compression, like saving space on hard drives or lowering the use of transmission bandwidth in the network.

In this work two intelligent techniques are used as lossless data compression algorithms; namely, clustering and association rules techniques.

In the first stage, the database is compressed by using a clustering technique followed by association rules algorithm.

The first technique partitions the data that exist in the database file and save these data as clusters by using the adaptive k-means algorithm while the second technique extracts the important rules from each cluster using the apriori algorithm.

Several experiments are made in several different sizes of database.

The experiments show that using the adaptive k-means algorithm and apriori algorithm together give better compression ratio and smaller compressed file size.

The apriori algorithm increases the compression ratio of the adaptive k-means algorithm when they are used together.

However they take more compression time than the adaptive k-means takes.

Also, when applying the apriori algorithm on the original database, it takes more compression time than the taken time when it is applied on the results of the adaptive k-means algorithm.

The adaptive k-means algorithm takes less time than the taken time when applying the apriori algorithm on the original database or on the the adaptive k-means results.

The adaptive k-means deals with any data type and it does not need to compute the distance between each data point and the center of cluster, while the traditional methods deal only with numeric data.

A full cycle for the proposed algorithms is to compress a database file, then decompresses the compressed one and return it identical to the original file.

The first decompression algorithm is the adaptive k-means decompression.

The second decompression algorithm is the apriori decompression algorithm.

II They aim to recover the original data from the compressed one or recover the original cluster's data from the compressed data when using the second decompression algorithm.

The results obtained from the experiments show that the compression time is less than the decompression time.

The proposed approach applied on areal database such as the database of employees by educational achievements.

Main Subjects

Information Technology and Computer Science

Topics

No. of Pages

92

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction to data compression.

Chapter Two : Clustering techniques and association rules algorithms.

Chapter Three : The proposed database compression approach.

Chapter Four : Experiments and results.

Chapter Five : Conclusions and future work.

References.

American Psychological Association (APA)

Ali, Rasha Subhi. (2013). Using clustering and association rules techniques to compress data sets. (Master's theses Theses and Dissertations Master). University of Baghdad, Iraq
https://search.emarefa.net/detail/BIM-605855

Modern Language Association (MLA)

Ali, Rasha Subhi. Using clustering and association rules techniques to compress data sets. (Master's theses Theses and Dissertations Master). University of Baghdad. (2013).
https://search.emarefa.net/detail/BIM-605855

American Medical Association (AMA)

Ali, Rasha Subhi. (2013). Using clustering and association rules techniques to compress data sets. (Master's theses Theses and Dissertations Master). University of Baghdad, Iraq
https://search.emarefa.net/detail/BIM-605855

Language

English

Data Type

Arab Theses

Record ID

BIM-605855