Arabic search results disambiguation : a supervised approach to unsupervised learning
Dissertant
Thesis advisor
University
Birzeit University
Faculty
Faculty of Engineering and Technology
Department
Department of Computer Science
University Country
Palestine (West Bank)
Degree
Master
Degree Date
2019
English Abstract
Web search engines aim at retrieving relevant results as a response to a given query, or more precisely an information need.
However, the query can be ambiguous, which means it might refer to different meanings or senses.
Search results clustering (SRC) is a powerful approach that dynamically attempts to find groups of sense-relevant results.
The preprocessing stage of SRC highly affects the effectiveness, and though there is a lot of research on SRC, the research has not yet clearly shown the best source from which features could be selected nor the best representation by which features could be represented.
Moreover, a little amount of research, with the lack of Arabic datasets, has been paid to Arabic.
The major contributions of this thesis are fourfold: 1) It examines the influence of feature source (i.e., title, snippet, etc.) and feature representation on the effectiveness of SRC, figuring out the best combination that results in a high-quality clustering of Arabic Web search results.
2) It introduces a set of benchmarks for Arabic, called AMBIGArabic, and a new framework, called Spread, for data labeling, search results acquisition, and performing SRC experiments.
3) It shows how useful the blind relevance feedback concept is in SRC.
4) Lastly, it proposes a new SRC approach, called SAUL, along with an implementation of this approach based on Wikipedia as a source of the senses.
The results show that feature sources and feature representations significantly affect the effectiveness of SRC, and combinations like (title with snippet, single words) and (title with snippet, single words with 2-gram and 3-gram words) are amongst the best.
Also, by comparing the best combinations, the proposed approach outperforms the baseline approach.
Main Subjects
Information Technology and Computer Science
Topics
No. of Pages
139
Table of Contents
Table of contents.
Abstract.
Chapter One : Introduction.
Chapter Two : Background.
Chapter Three : Literature review.
Chapter Four : Data collection : AMBIGArabic.
Chapter Five : Experimental design and methodology.
Chapter Six : Spread framework.
Chapter Seven : Evaluation and statistics.
Chapter Eight : Conclusion and outlook.
References
American Psychological Association (APA)
Salihi, Haytham. (2019). Arabic search results disambiguation : a supervised approach to unsupervised learning. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-958504
Modern Language Association (MLA)
Salihi, Haytham. Arabic search results disambiguation : a supervised approach to unsupervised learning. (Master's theses Theses and Dissertations Master). Birzeit University. (2019).
https://search.emarefa.net/detail/BIM-958504
American Medical Association (AMA)
Salihi, Haytham. (2019). Arabic search results disambiguation : a supervised approach to unsupervised learning. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-958504
Language
English
Data Type
Arab Theses
Record ID
BIM-958504