An enhancement over BIRCH hierarchical clustering algorithms for better partitioning of medical data

مقدم أطروحة جامعية

al-Nusur, Rad Muhammad Jamil

مشرف أطروحة جامعية

al-Sharruf, Fayiz

الجامعة

جامعة الإسراء

الكلية

كلية تكنولوجيا المعلومات

القسم الأكاديمي

قسم هندسة البرمجيات

دولة الجامعة

الأردن

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2020

الملخص الإنجليزي

Over the years, technology has revolutionized our world and daily lives, information is getting to be more accessible and shared to the public users, big data across the web are being collected and saved in all forms from texts to different media files, machine learning algorithms are utilizing these data to learn more about it which in response, could improve these algorithms to be more useful and applicable in the real world, Clustering algorithms are unsupervised machine learning algorithms that can be used in many fields including pattern recognition and image analysis, There are many clustering algorithms such as K-means and Agglomerative Hierarchical Clustering (AHC), however they work fine in specific data sets.

Clustering algorithms can be used to cluster medical data to find an undiscovered pattern which in result improves the medical field’s knowledge about patients and different diseases, This thesis will focus on one of the most dangerous diseases cancer, SEER databases provides a big amount of data from the year of 1973 until now about cancer patients from various locations and sources throughout the United States, to find useful patterns through these data a good clustering algorithm is needed to cluster such big data, BIRCH is one of the most effective clustering algorithms on big data.

This thesis investigates the development of new technologies to propose the MD-BIRCH algorithm which is an enhanced version of BIRCH algorithm by implementing Manhattan distance over multiple phases of BIRCH algorithm from early stages of compacting data points into an initial Clustering Feature (CF) tree to the middle stages while descending the tree into more depth to the late stages of removing the outliers and performing global clustering on the whole tree by another modified clustering algorithm based on Manhattan distance.

The experiments have been conducted on SEER medical dataset over multiple clustering iterations, where each BIRCH and MD-BIRCH has been executed 8 times over cancer patients big data sample, the results showed that the MD-BIRCH algorithm has outperformed BIRCH algorithm in terms of quality and has a slightly an enhanced performance.

This work has been implemented by Python 3.7 programming language.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

52

قائمة المحتويات

Table of contents.

Abstract.

Chapter One : Introduction.

Chapter Two : Literature review.

Chapter Three : Methodology.

Chapter Four : Design, analysis and implementation.

Chapter Five : Results.

Chapter Six : Conclusion and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Nusur, Rad Muhammad Jamil. (2020). An enhancement over BIRCH hierarchical clustering algorithms for better partitioning of medical data. (Master's theses Theses and Dissertations Master). Isra University, Jordan
https://search.emarefa.net/detail/BIM-985129

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Nusur, Rad Muhammad Jamil. An enhancement over BIRCH hierarchical clustering algorithms for better partitioning of medical data. (Master's theses Theses and Dissertations Master). Isra University. (2020).
https://search.emarefa.net/detail/BIM-985129

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Nusur, Rad Muhammad Jamil. (2020). An enhancement over BIRCH hierarchical clustering algorithms for better partitioning of medical data. (Master's theses Theses and Dissertations Master). Isra University, Jordan
https://search.emarefa.net/detail/BIM-985129

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-985129