Performance evaluation of similarity functions for duplicate record detection
Other Title(s)
تقييم أداء دالات الكشف عن التشابه للسجلات المكررة
Dissertant
Thesis advisor
Comitee Members
Shilbayah, Nidal F.
al-Umari, Ahmad H.
University
Middle East University
Faculty
Faculty of Information Technology
Department
Department of Computer Information Systems
University Country
Jordan
Degree
Master
Degree Date
2011
English Abstract
Duplicate record detection is an important process in data quality.
Its methods usually rely on the use of similarity functions to identify pairs of records in one or more datasets that refer to the same real world entity.
There is a wide range of similarity functions and very few studies that compare the effectiveness of the various similarity functions.
In our research we evaluate the quality of a number of similarity functions on synthetic datasets using a measure used in approximate querying called discernability.
We based on the semi-automatic method to estimate optimal threshold values.
Experiments were carried out to prove the technique proposed.
The results show that discernability measure can determine the threshold value and measure if a similarity function is more adequate for a specific data set than another .
Main Subjects
Information Technology and Computer Science
No. of Pages
83
Table of Contents
Table of contents.
Abstract.
Abstract in Arabic.
Chapter One : Introduction.
CNS Intravascular Lymphoma : A Case Report /
Chapter Three : Duplecate detaction framework.
Chapter Four : Analysis and results.
Chapter Five : Conclusion and future work.
References.
American Psychological Association (APA)
al-Nuri, Mithaq Kazim. (2011). Performance evaluation of similarity functions for duplicate record detection. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-694891
Modern Language Association (MLA)
al-Nuri, Mithaq Kazim. Performance evaluation of similarity functions for duplicate record detection. (Master's theses Theses and Dissertations Master). Middle East University. (2011).
https://search.emarefa.net/detail/BIM-694891
American Medical Association (AMA)
al-Nuri, Mithaq Kazim. (2011). Performance evaluation of similarity functions for duplicate record detection. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-694891
Language
English
Data Type
Arab Theses
Record ID
BIM-694891