Arabic search results disambiguation : a supervised approach to unsupervised learning

Dissertant

Salihi, Haytham

Thesis advisor

Jarrar, Radi
Yahya, Adnan H.

University

Birzeit University

Faculty

Faculty of Engineering and Technology

Department

Department of Computer Science

University Country

Palestine (West Bank)

Degree

Master

Degree Date

2019

English Abstract

Web search engines aim at retrieving relevant results as a response to a given query, or more precisely an information need.

However, the query can be ambiguous, which means it might refer to different meanings or senses.

Search results clustering (SRC) is a powerful approach that dynamically attempts to find groups of sense-relevant results.

The preprocessing stage of SRC highly affects the effectiveness, and though there is a lot of research on SRC, the research has not yet clearly shown the best source from which features could be selected nor the best representation by which features could be represented.

Moreover, a little amount of research, with the lack of Arabic datasets, has been paid to Arabic.

The major contributions of this thesis are fourfold: 1) It examines the influence of feature source (i.e., title, snippet, etc.) and feature representation on the effectiveness of SRC, figuring out the best combination that results in a high-quality clustering of Arabic Web search results.

2) It introduces a set of benchmarks for Arabic, called AMBIGArabic, and a new framework, called Spread, for data labeling, search results acquisition, and performing SRC experiments.

3) It shows how useful the blind relevance feedback concept is in SRC.

4) Lastly, it proposes a new SRC approach, called SAUL, along with an implementation of this approach based on Wikipedia as a source of the senses.

The results show that feature sources and feature representations significantly affect the effectiveness of SRC, and combinations like (title with snippet, single words) and (title with snippet, single words with 2-gram and 3-gram words) are amongst the best.

Also, by comparing the best combinations, the proposed approach outperforms the baseline approach.

Main Subjects

Information Technology and Computer Science

Topics

No. of Pages

139

Table of Contents

Table of contents.

Abstract.

Chapter One : Introduction.

Chapter Two : Background.

Chapter Three : Literature review.

Chapter Four : Data collection : AMBIGArabic.

Chapter Five : Experimental design and methodology.

Chapter Six : Spread framework.

Chapter Seven : Evaluation and statistics.

Chapter Eight : Conclusion and outlook.

References

American Psychological Association (APA)

Salihi, Haytham. (2019). Arabic search results disambiguation : a supervised approach to unsupervised learning. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-958504

Modern Language Association (MLA)

Salihi, Haytham. Arabic search results disambiguation : a supervised approach to unsupervised learning. (Master's theses Theses and Dissertations Master). Birzeit University. (2019).
https://search.emarefa.net/detail/BIM-958504

American Medical Association (AMA)

Salihi, Haytham. (2019). Arabic search results disambiguation : a supervised approach to unsupervised learning. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-958504

Language

English

Data Type

Arab Theses

Record ID

BIM-958504