An enhancement over BIRCH hierarchical clustering algorithms for better partitioning of medical data

Dissertant

al-Nusur, Rad Muhammad Jamil

Thesis advisor

al-Sharruf, Fayiz

University

Isra University

Faculty

Faculty of Information Technology

Department

Department Software Engineering

University Country

Jordan

Degree

Master

Degree Date

2020

English Abstract

Over the years, technology has revolutionized our world and daily lives, information is getting to be more accessible and shared to the public users, big data across the web are being collected and saved in all forms from texts to different media files, machine learning algorithms are utilizing these data to learn more about it which in response, could improve these algorithms to be more useful and applicable in the real world, Clustering algorithms are unsupervised machine learning algorithms that can be used in many fields including pattern recognition and image analysis, There are many clustering algorithms such as K-means and Agglomerative Hierarchical Clustering (AHC), however they work fine in specific data sets.

Clustering algorithms can be used to cluster medical data to find an undiscovered pattern which in result improves the medical field’s knowledge about patients and different diseases, This thesis will focus on one of the most dangerous diseases cancer, SEER databases provides a big amount of data from the year of 1973 until now about cancer patients from various locations and sources throughout the United States, to find useful patterns through these data a good clustering algorithm is needed to cluster such big data, BIRCH is one of the most effective clustering algorithms on big data.

This thesis investigates the development of new technologies to propose the MD-BIRCH algorithm which is an enhanced version of BIRCH algorithm by implementing Manhattan distance over multiple phases of BIRCH algorithm from early stages of compacting data points into an initial Clustering Feature (CF) tree to the middle stages while descending the tree into more depth to the late stages of removing the outliers and performing global clustering on the whole tree by another modified clustering algorithm based on Manhattan distance.

The experiments have been conducted on SEER medical dataset over multiple clustering iterations, where each BIRCH and MD-BIRCH has been executed 8 times over cancer patients big data sample, the results showed that the MD-BIRCH algorithm has outperformed BIRCH algorithm in terms of quality and has a slightly an enhanced performance.

This work has been implemented by Python 3.7 programming language.

Main Subjects

Information Technology and Computer Science

Topics

No. of Pages

52

Table of Contents

Table of contents.

Abstract.

Chapter One : Introduction.

Chapter Two : Literature review.

Chapter Three : Methodology.

Chapter Four : Design, analysis and implementation.

Chapter Five : Results.

Chapter Six : Conclusion and future work.

References.

American Psychological Association (APA)

al-Nusur, Rad Muhammad Jamil. (2020). An enhancement over BIRCH hierarchical clustering algorithms for better partitioning of medical data. (Master's theses Theses and Dissertations Master). Isra University, Jordan
https://search.emarefa.net/detail/BIM-985129

Modern Language Association (MLA)

al-Nusur, Rad Muhammad Jamil. An enhancement over BIRCH hierarchical clustering algorithms for better partitioning of medical data. (Master's theses Theses and Dissertations Master). Isra University. (2020).
https://search.emarefa.net/detail/BIM-985129

American Medical Association (AMA)

al-Nusur, Rad Muhammad Jamil. (2020). An enhancement over BIRCH hierarchical clustering algorithms for better partitioning of medical data. (Master's theses Theses and Dissertations Master). Isra University, Jordan
https://search.emarefa.net/detail/BIM-985129

Language

English

Data Type

Arab Theses

Record ID

BIM-985129