A hybrid approach for web change detection

مقدم أطروحة جامعية

al-Qayidah, Sakhr Khalil

مشرف أطروحة جامعية

al-Fayyumi, Muhammad Ahmad
Hattab, Izz al-Din Shakir Hasan

أعضاء اللجنة

Ulwan, Rad
Aqil, Misbah

الجامعة

جامعة الشرق الأوسط

الكلية

كلية تكنولوجيا المعلومات

القسم الأكاديمي

قسم نظم المعلومات الحاسوبية

دولة الجامعة

الأردن

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2009

الملخص الإنجليزي

Search engines save copies of web pages to facilitate searching for information in these pages.

Editing and creating pages techniques become easier and available for everyone to edit or create web pages.

Web servers that contain web pages don’t submit new changes to search engines because they are not committed to do this.

A part of web pages stored in search engine's repository become different than the original source, so search engines should save the new versions of the edited pages to keep the advantage of providing search services of information to be identical to the original source to avoid considering information in search engine old like news.

Some solutions of this problem suggest that the search engine expect the date of the change in pages’ content to store them after change occurrence directly and avoid saving other pages that don’t contain any new changes.

Change in page can be expected by monitoring changes over time and detecting its change rate.

Another approach to determine web site change rate is to take sample from each web site that contain web pages and detect change rate of each web site according to change rate in each sample, then distribute search engine efforts for saving pages over web sites according to its change rate.

Another approach clusters all web pages from different web sites in clusters of change rate levels, samples pages from each cluster to determine its change rate.

This thesis presents another approach to better change rate detection in web pages by combining two approaches mentioned above, which are monitoring changes in each page to determine its change rate and the other approach is to take sample form each web site to determine its change rate.

The new approach is based on sampling pages from web sites, then monitoring changes in each page in the sample over time and over many versions of the page, so we detect change rate of the sample more accurately.

Experiments proved the effectiveness of the new approach in different search engine cases like low sources for sampling web sites.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

عدد الصفحات

41

قائمة المحتويات

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Sampling policies.

Chapter Three : The proposed policy.

Chapter Four : Experiments.

Chapter Five : Conclusions and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Qayidah, Sakhr Khalil. (2009). A hybrid approach for web change detection. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-694210

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Qayidah, Sakhr Khalil. A hybrid approach for web change detection. (Master's theses Theses and Dissertations Master). Middle East University. (2009).
https://search.emarefa.net/detail/BIM-694210

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Qayidah, Sakhr Khalil. (2009). A hybrid approach for web change detection. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-694210

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-694210