Web mining based on island genetic algorithms

العناوين الأخرى

التنقيب في الشبكة العنكبوتية الواسعة باستخدام خوارزميات الجزر الجينية

مقدم أطروحة جامعية

Mizyan, Nuha Marwan Ismail

مشرف أطروحة جامعية

Samawi, Venus W.

أعضاء اللجنة

al-Nihoud, Jihad Quball Awdah
al-Rababiah, Mamun S.
Hamidi, Ismail I.

الجامعة

جامعة آل البيت

الكلية

كلية الأمير الحسين بن عبد الله لتكنولوجيا المعلومات

القسم الأكاديمي

قسم علوم الحاسوب

دولة الجامعة

الأردن

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2009

الملخص الإنجليزي

Different people with different information need use the WWW as a mine of information.

Simply, a query is entered to a search engine which sifts through millions of Web pages to find relevant ones.

As the WWW differs every day from the day before not only in the amount of information but also in the contents of Web pages, thus, a traditional search engine may present low quality Web pages in between the results it generates.

In addition to that, it needs more time every day to find what a user looks for.

Consequently, a need for a technique to improve the results of the traditional search engine either in time consuming or in the degree of relevance for the presented Web pages appears.

This could be done via applying genetic algorithms on Web mining.

Variant researches dealt with this field by applying genetic algorithms on Web content mining, Web usage mining, or Web structure mining, but none of them used parallel genetic algorithms which may score promising results, not only in gathering relevant Web pages, but also in decreasing the time needed to search a huge information repository like the WWW, where the parallel approach of the genetic algorithm, such as island genetic algorithm may be a good idea.

In this research, four different islands with different selection methods and fitness functions are applied on Web content mining; island-one uses the random tournament selection and Jacquard's coefficient, island-two uses the OchiaVs coefficient and the same selection method used in the previous one, island-three that uses the unbiased tournament selection and the Jaccard's coefficient, and finally, island-four that employs the Ochoa's coefficient as a fitness function and the same unbiased tournament selection that is used in island-three.

Applying these different islands on Web content mining will lead to a faster search through the WWW since these islands may work independently on different servers which lead to a parallel behavior.

In this work, query expansion technique is used.

The island genetic search had been activated two times, the first time before expanding the query, while the second one after query expansion.

This technique improves the Web searching results.

The cosine similarity is used as a judge on the relevancy of the retrieved Web pages.

To improve the final results and get more relevant pages, merging approach is suggested where the final results generated from the four islands are merged and ranked depending on the cosine similarity values.

This research also studies the behavior of the four islands by comparing their retrieve ability from retrieved-pages relevancy point of view.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

115

قائمة المحتويات

Table of contents.

Abstract.

Chapter One : introduction.

Chapter Two : theoretical concepts.

Chapter Three : methodology.

Chapter Four : experimental results.

Chapter Five : conclusions and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Mizyan, Nuha Marwan Ismail. (2009). Web mining based on island genetic algorithms. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321554

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Mizyan, Nuha Marwan Ismail. Web mining based on island genetic algorithms. (Master's theses Theses and Dissertations Master). Al albayt University. (2009).
https://search.emarefa.net/detail/BIM-321554

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Mizyan, Nuha Marwan Ismail. (2009). Web mining based on island genetic algorithms. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321554

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-321554