A hybrid approach for web change detection

Dissertant

al-Qayidah, Sakhr Khalil

Thesis advisor

al-Fayyumi, Muhammad Ahmad
Hattab, Izz al-Din Shakir Hasan

Comitee Members

Ulwan, Rad
Aqil, Misbah

University

Middle East University

Faculty

Faculty of Information Technology

Department

Department of Computer Information Systems

University Country

Jordan

Degree

Master

Degree Date

2009

English Abstract

Search engines save copies of web pages to facilitate searching for information in these pages.

Editing and creating pages techniques become easier and available for everyone to edit or create web pages.

Web servers that contain web pages don’t submit new changes to search engines because they are not committed to do this.

A part of web pages stored in search engine's repository become different than the original source, so search engines should save the new versions of the edited pages to keep the advantage of providing search services of information to be identical to the original source to avoid considering information in search engine old like news.

Some solutions of this problem suggest that the search engine expect the date of the change in pages’ content to store them after change occurrence directly and avoid saving other pages that don’t contain any new changes.

Change in page can be expected by monitoring changes over time and detecting its change rate.

Another approach to determine web site change rate is to take sample from each web site that contain web pages and detect change rate of each web site according to change rate in each sample, then distribute search engine efforts for saving pages over web sites according to its change rate.

Another approach clusters all web pages from different web sites in clusters of change rate levels, samples pages from each cluster to determine its change rate.

This thesis presents another approach to better change rate detection in web pages by combining two approaches mentioned above, which are monitoring changes in each page to determine its change rate and the other approach is to take sample form each web site to determine its change rate.

The new approach is based on sampling pages from web sites, then monitoring changes in each page in the sample over time and over many versions of the page, so we detect change rate of the sample more accurately.

Experiments proved the effectiveness of the new approach in different search engine cases like low sources for sampling web sites.

Main Subjects

Information Technology and Computer Science

No. of Pages

41

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Sampling policies.

Chapter Three : The proposed policy.

Chapter Four : Experiments.

Chapter Five : Conclusions and future work.

References.

American Psychological Association (APA)

al-Qayidah, Sakhr Khalil. (2009). A hybrid approach for web change detection. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-694210

Modern Language Association (MLA)

al-Qayidah, Sakhr Khalil. A hybrid approach for web change detection. (Master's theses Theses and Dissertations Master). Middle East University. (2009).
https://search.emarefa.net/detail/BIM-694210

American Medical Association (AMA)

al-Qayidah, Sakhr Khalil. (2009). A hybrid approach for web change detection. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-694210

Language

English

Data Type

Arab Theses

Record ID

BIM-694210