Non-numerical clustering using modified overlapping partitioning cluster algorithm

Other Title(s)

التوزيع العنقودي للبيانات غير الرقمية باستخدام الخوارزمية المعدلة للتقسيم العنقودي المتداخل

Dissertant

al-Aqtash, Muhammad Yahya

Thesis advisor

al-Azami, Muayyad A. Fadhil

Comitee Members

al-Badarinah, Amir F.
Maush, Murad

University

Philadelphia University

Faculty

Faculty of Information Technology

Department

Department of Computer Science

University Country

Jordan

Degree

Master

Degree Date

2016

English Abstract

Clustering is considered an important problem of unsupervised learning.

It can represent the solution to find a structure of unlabeled data.

Such structure puts data into groups containing similar data objects.

Data objects in the same group are dissimilar from other data objects of other groups.

Criteria need to be defined to distinguish between data objects, including similarity that has been used as either a geometric distance between data objects or as a common characteristic between data objects.

In the past decades, different clustering techniques like hierarchical, partitioning and grid-based were introduced.

Each of them was dealing with different applications in different aspects.

In this research, we use the Overlapping Partitioning Clustering (OPC) algorithm based on partitioning clustering technique.

OPC was introduced with two main goals: (1) to maximize the number of objects belonging to one cluster, and (2) to maximize the distance between cluster centers.

OPC algorithm has considered non-exhaustive as it allows some objects not to belong to any cluster.

Also is has overlapping since one object may belong to more than one cluster.

A huge amount of data with textual nature has become available in recent years and the need for algorithms to deal with such data has become critical.

OPC is a powerful algorithm, but it only deals with numeric data types.

This limitation motivates research to extend OPC to handle non-numerical type of data.

This thesis introduces modifications to the original algorithm to handle textual type of data, these modifications has included changing of data representation in a first stage, and changing similarity measures at a later stage to fulfill type of data requirement.

Some other functions of the original algorithm has modified to cope with both data representation and new similarity measure, especially the smart selection function that guarantees achieving algorithm goals.

The modified algorithm has experimented on 20-newsgroup, a real data set, results obtained has shown successful clustering by modified algorithm; evaluation of the results obtained has also made using two clustering evaluation metrics, V-measures and Adjusted Rand Index.

The modified algorithm has been compared by k-means in four aspects, and the final resolution has shown that the results are promising

Main Subjects

Information Technology and Computer Science

No. of Pages

59

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Literature review.

Chapter Three : Modified overlapping partitioning cluster algorithm.

Chapter Four : Modified opc architecture and implementation.

Chapter Five : Results, discussion and evaluation.

Chapter Six : Conclusions and future works.

References.

American Psychological Association (APA)

al-Aqtash, Muhammad Yahya. (2016). Non-numerical clustering using modified overlapping partitioning cluster algorithm. (Master's theses Theses and Dissertations Master). Philadelphia University, Jordan
https://search.emarefa.net/detail/BIM-725415

Modern Language Association (MLA)

al-Aqtash, Muhammad Yahya. Non-numerical clustering using modified overlapping partitioning cluster algorithm. (Master's theses Theses and Dissertations Master). Philadelphia University. (2016).
https://search.emarefa.net/detail/BIM-725415

American Medical Association (AMA)

al-Aqtash, Muhammad Yahya. (2016). Non-numerical clustering using modified overlapping partitioning cluster algorithm. (Master's theses Theses and Dissertations Master). Philadelphia University, Jordan
https://search.emarefa.net/detail/BIM-725415

Language

English

Data Type

Arab Theses

Record ID

BIM-725415