Performance evaluation of similarity functions for duplicate record detection

Other Title(s)

تقييم أداء دالات الكشف عن التشابه للسجلات المكررة

Dissertant

al-Nuri, Mithaq Kazim

Thesis advisor

Aqil, Misbah M.

Comitee Members

Shilbayah, Nidal F.
al-Umari, Ahmad H.

University

Middle East University

Faculty

Faculty of Information Technology

Department

Department of Computer Information Systems

University Country

Jordan

Degree

Master

Degree Date

2011

English Abstract

Duplicate record detection is an important process in data quality.

Its methods usually rely on the use of similarity functions to identify pairs of records in one or more datasets that refer to the same real world entity.

There is a wide range of similarity functions and very few studies that compare the effectiveness of the various similarity functions.

In our research we evaluate the quality of a number of similarity functions on synthetic datasets using a measure used in approximate querying called discernability.

We based on the semi-automatic method to estimate optimal threshold values.

Experiments were carried out to prove the technique proposed.

The results show that discernability measure can determine the threshold value and measure if a similarity function is more adequate for a specific data set than another .

Main Subjects

Information Technology and Computer Science

No. of Pages

83

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

CNS Intravascular Lymphoma : A Case Report /

Chapter Three : Duplecate detaction framework.

Chapter Four : Analysis and results.

Chapter Five : Conclusion and future work.

References.

American Psychological Association (APA)

al-Nuri, Mithaq Kazim. (2011). Performance evaluation of similarity functions for duplicate record detection. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-694891

Modern Language Association (MLA)

al-Nuri, Mithaq Kazim. Performance evaluation of similarity functions for duplicate record detection. (Master's theses Theses and Dissertations Master). Middle East University. (2011).
https://search.emarefa.net/detail/BIM-694891

American Medical Association (AMA)

al-Nuri, Mithaq Kazim. (2011). Performance evaluation of similarity functions for duplicate record detection. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-694891

Language

English

Data Type

Arab Theses

Record ID

BIM-694891