Using clustering and association rules techniques to compress data sets

العناوين الأخرى

استخدام تقنيات التجميع و قواعد الربط لضغط مجموعات البيانات

مقدم أطروحة جامعية

Ali, Rasha Subhi

مشرف أطروحة جامعية

Duaymi, Mahdi Kazaz.
al-Ubaydi, Ahmad Tariq Sadiq

أعضاء اللجنة

al-Zubaydi, Diya A.
al-Alusi, Nida Fulayyih Hasan
Khalid, Lamya Hafiz

الجامعة

جامعة بغداد

الكلية

كلية العلوم

القسم الأكاديمي

قسم علوم الحاسبات

دولة الجامعة

العراق

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2013

الملخص الإنجليزي

There are many benefits from using data compression, like saving space on hard drives or lowering the use of transmission bandwidth in the network.

In this work two intelligent techniques are used as lossless data compression algorithms; namely, clustering and association rules techniques.

In the first stage, the database is compressed by using a clustering technique followed by association rules algorithm.

The first technique partitions the data that exist in the database file and save these data as clusters by using the adaptive k-means algorithm while the second technique extracts the important rules from each cluster using the apriori algorithm.

Several experiments are made in several different sizes of database.

The experiments show that using the adaptive k-means algorithm and apriori algorithm together give better compression ratio and smaller compressed file size.

The apriori algorithm increases the compression ratio of the adaptive k-means algorithm when they are used together.

However they take more compression time than the adaptive k-means takes.

Also, when applying the apriori algorithm on the original database, it takes more compression time than the taken time when it is applied on the results of the adaptive k-means algorithm.

The adaptive k-means algorithm takes less time than the taken time when applying the apriori algorithm on the original database or on the the adaptive k-means results.

The adaptive k-means deals with any data type and it does not need to compute the distance between each data point and the center of cluster, while the traditional methods deal only with numeric data.

A full cycle for the proposed algorithms is to compress a database file, then decompresses the compressed one and return it identical to the original file.

The first decompression algorithm is the adaptive k-means decompression.

The second decompression algorithm is the apriori decompression algorithm.

II They aim to recover the original data from the compressed one or recover the original cluster's data from the compressed data when using the second decompression algorithm.

The results obtained from the experiments show that the compression time is less than the decompression time.

The proposed approach applied on areal database such as the database of employees by educational achievements.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

92

قائمة المحتويات

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction to data compression.

Chapter Two : Clustering techniques and association rules algorithms.

Chapter Three : The proposed database compression approach.

Chapter Four : Experiments and results.

Chapter Five : Conclusions and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Ali, Rasha Subhi. (2013). Using clustering and association rules techniques to compress data sets. (Master's theses Theses and Dissertations Master). University of Baghdad, Iraq
https://search.emarefa.net/detail/BIM-605855

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Ali, Rasha Subhi. Using clustering and association rules techniques to compress data sets. (Master's theses Theses and Dissertations Master). University of Baghdad. (2013).
https://search.emarefa.net/detail/BIM-605855

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Ali, Rasha Subhi. (2013). Using clustering and association rules techniques to compress data sets. (Master's theses Theses and Dissertations Master). University of Baghdad, Iraq
https://search.emarefa.net/detail/BIM-605855

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-605855