Web mining based on island genetic algorithms
Other Title(s)
التنقيب في الشبكة العنكبوتية الواسعة باستخدام خوارزميات الجزر الجينية
Dissertant
Thesis advisor
Comitee Members
al-Nihoud, Jihad Quball Awdah
al-Rababiah, Mamun S.
Hamidi, Ismail I.
University
Al albayt University
Faculty
Prince Hussein Bin Abdullah Faculty for Information Technology
Department
Department of Computer Science
University Country
Jordan
Degree
Master
Degree Date
2009
English Abstract
Different people with different information need use the WWW as a mine of information.
Simply, a query is entered to a search engine which sifts through millions of Web pages to find relevant ones.
As the WWW differs every day from the day before not only in the amount of information but also in the contents of Web pages, thus, a traditional search engine may present low quality Web pages in between the results it generates.
In addition to that, it needs more time every day to find what a user looks for.
Consequently, a need for a technique to improve the results of the traditional search engine either in time consuming or in the degree of relevance for the presented Web pages appears.
This could be done via applying genetic algorithms on Web mining.
Variant researches dealt with this field by applying genetic algorithms on Web content mining, Web usage mining, or Web structure mining, but none of them used parallel genetic algorithms which may score promising results, not only in gathering relevant Web pages, but also in decreasing the time needed to search a huge information repository like the WWW, where the parallel approach of the genetic algorithm, such as island genetic algorithm may be a good idea.
In this research, four different islands with different selection methods and fitness functions are applied on Web content mining; island-one uses the random tournament selection and Jacquard's coefficient, island-two uses the OchiaVs coefficient and the same selection method used in the previous one, island-three that uses the unbiased tournament selection and the Jaccard's coefficient, and finally, island-four that employs the Ochoa's coefficient as a fitness function and the same unbiased tournament selection that is used in island-three.
Applying these different islands on Web content mining will lead to a faster search through the WWW since these islands may work independently on different servers which lead to a parallel behavior.
In this work, query expansion technique is used.
The island genetic search had been activated two times, the first time before expanding the query, while the second one after query expansion.
This technique improves the Web searching results.
The cosine similarity is used as a judge on the relevancy of the retrieved Web pages.
To improve the final results and get more relevant pages, merging approach is suggested where the final results generated from the four islands are merged and ranked depending on the cosine similarity values.
This research also studies the behavior of the four islands by comparing their retrieve ability from retrieved-pages relevancy point of view.
Main Subjects
Information Technology and Computer Science
Topics
No. of Pages
115
Table of Contents
Table of contents.
Abstract.
Chapter One : introduction.
Chapter Two : theoretical concepts.
Chapter Three : methodology.
Chapter Four : experimental results.
Chapter Five : conclusions and future work.
References.
American Psychological Association (APA)
Mizyan, Nuha Marwan Ismail. (2009). Web mining based on island genetic algorithms. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321554
Modern Language Association (MLA)
Mizyan, Nuha Marwan Ismail. Web mining based on island genetic algorithms. (Master's theses Theses and Dissertations Master). Al albayt University. (2009).
https://search.emarefa.net/detail/BIM-321554
American Medical Association (AMA)
Mizyan, Nuha Marwan Ismail. (2009). Web mining based on island genetic algorithms. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321554
Language
English
Data Type
Arab Theses
Record ID
BIM-321554