A hybrid approach for web change detection
Dissertant
Thesis advisor
al-Fayyumi, Muhammad Ahmad
Hattab, Izz al-Din Shakir Hasan
Comitee Members
University
Middle East University
Faculty
Faculty of Information Technology
Department
Department of Computer Information Systems
University Country
Jordan
Degree
Master
Degree Date
2009
English Abstract
Search engines save copies of web pages to facilitate searching for information in these pages.
Editing and creating pages techniques become easier and available for everyone to edit or create web pages.
Web servers that contain web pages don’t submit new changes to search engines because they are not committed to do this.
A part of web pages stored in search engine's repository become different than the original source, so search engines should save the new versions of the edited pages to keep the advantage of providing search services of information to be identical to the original source to avoid considering information in search engine old like news.
Some solutions of this problem suggest that the search engine expect the date of the change in pages’ content to store them after change occurrence directly and avoid saving other pages that don’t contain any new changes.
Change in page can be expected by monitoring changes over time and detecting its change rate.
Another approach to determine web site change rate is to take sample from each web site that contain web pages and detect change rate of each web site according to change rate in each sample, then distribute search engine efforts for saving pages over web sites according to its change rate.
Another approach clusters all web pages from different web sites in clusters of change rate levels, samples pages from each cluster to determine its change rate.
This thesis presents another approach to better change rate detection in web pages by combining two approaches mentioned above, which are monitoring changes in each page to determine its change rate and the other approach is to take sample form each web site to determine its change rate.
The new approach is based on sampling pages from web sites, then monitoring changes in each page in the sample over time and over many versions of the page, so we detect change rate of the sample more accurately.
Experiments proved the effectiveness of the new approach in different search engine cases like low sources for sampling web sites.
Main Subjects
Information Technology and Computer Science
No. of Pages
41
Table of Contents
Table of contents.
Abstract.
Abstract in Arabic.
Chapter One : Introduction.
Chapter Two : Sampling policies.
Chapter Three : The proposed policy.
Chapter Four : Experiments.
Chapter Five : Conclusions and future work.
References.
American Psychological Association (APA)
al-Qayidah, Sakhr Khalil. (2009). A hybrid approach for web change detection. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-694210
Modern Language Association (MLA)
al-Qayidah, Sakhr Khalil. A hybrid approach for web change detection. (Master's theses Theses and Dissertations Master). Middle East University. (2009).
https://search.emarefa.net/detail/BIM-694210
American Medical Association (AMA)
al-Qayidah, Sakhr Khalil. (2009). A hybrid approach for web change detection. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-694210
Language
English
Data Type
Arab Theses
Record ID
BIM-694210