Semi-automatic method for info boxes extraction for Arabic Wikipedia articles
Other Title(s)
طريقة شبه آلية لاستخلاص معلومات مختصرة لمقالات ويكيبيديا العربية
Dissertant
Thesis advisor
Comitee Members
Abu-Shaban, Yusuf Nabil
al-Halis, Ala Mustafa
University
Islamic University
Faculty
Faculty of Information Technology
Department
Information Technology
University Country
Palestine (Gaza Strip)
Degree
Master
Degree Date
2016
English Abstract
Arabic Language is one of the popular languages over the world.
There is 5% of people over the world speak Arabic.
However, it suffers from a low percentage of content over the internet.
Wikipedia is a very well-known multilingual, web-based, free-content encyclopedia project supported by the Wikimedia Foundation and based on a model of openly editable content.
It is one of the greatest repositories of human knowledge ever constructed, and has high ranks in Google that makes its pages often pop up in search results.
Arabic Wikipedia, which is part of Wikipedia website, lacks valuable content compared to Wikipedia content for other languages.
Besides, many of existing articles are stub pages containing only one or few sentences of text that is too short to provide encyclopedic coverage of a subject.
Some researchers worked on increasing and enriching the content of Wikipedia, but most of these efforts focused on developing methods that process text in other languages rather than Arabic.
This research aims at boosting online Arabic content.
In particular, it aims to boost the editing process in Arabic Wikipedia.
Our main objective is to develop method for suggesting contents for Arabic Wikipedia articles either to enrich the contents of existing stub pages or to generate new ones that contain infobox.
The proposed methods build on existing methods in Information Retrieval, Question Answering, and Text Mining in order to extract key information from relevant documents on the web.
The automatically generated contents and the different resources from which these contents are extracted will be available for Wikipedia editors for revision and proofreading before adding them to Wikipedia.
In this research, we focus on enriching the Infobox which is a summary of some unifying parameters at the top left/right corner of an article.
We developed four main algorithms to extract) birth, death )locations, (birth, death) dates and full name of entity.
We have conducted many experiments to evaluate our methods on articles about named entities in the political domain.
Our results achieved an overall accuracy of 80.3%.
Main Subjects
Information Technology and Computer Science
No. of Pages
93
Table of Contents
Table of contents.
Abstract.
Abstract in Arabic.
Chapter One : Introduction.
Chapter Two : Related work.
Chapter Three : Proposed methodology for constructing infobox.
Chapter Four : System technical implementation.
Chapter Five : Results and discussion.
Chapter Six : Conclusions and future work.
References.
American Psychological Association (APA)
Shublaq, Salim Muhammad Salim. (2016). Semi-automatic method for info boxes extraction for Arabic Wikipedia articles. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-735540
Modern Language Association (MLA)
Shublaq, Salim Muhammad Salim. Semi-automatic method for info boxes extraction for Arabic Wikipedia articles. (Master's theses Theses and Dissertations Master). Islamic University. (2016).
https://search.emarefa.net/detail/BIM-735540
American Medical Association (AMA)
Shublaq, Salim Muhammad Salim. (2016). Semi-automatic method for info boxes extraction for Arabic Wikipedia articles. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-735540
Language
English
Data Type
Arab Theses
Record ID
BIM-735540