Semi-automatic method for info boxes extraction for Arabic Wikipedia articles

Other Title(s)

طريقة شبه آلية لاستخلاص معلومات مختصرة لمقالات ويكيبيديا العربية

Dissertant

Shublaq, Salim Muhammad Salim

Thesis advisor

Awad Allah, Riwayah Fawzi

Comitee Members

Abu-Shaban, Yusuf Nabil
al-Halis, Ala Mustafa

University

Islamic University

Faculty

Faculty of Information Technology

Department

Information Technology

University Country

Palestine (Gaza Strip)

Degree

Master

Degree Date

2016

English Abstract

Arabic Language is one of the popular languages over the world.

There is 5% of people over the world speak Arabic.

However, it suffers from a low percentage of content over the internet.

Wikipedia is a very well-known multilingual, web-based, free-content encyclopedia project supported by the Wikimedia Foundation and based on a model of openly editable content.

It is one of the greatest repositories of human knowledge ever constructed, and has high ranks in Google that makes its pages often pop up in search results.

Arabic Wikipedia, which is part of Wikipedia website, lacks valuable content compared to Wikipedia content for other languages.

Besides, many of existing articles are stub pages containing only one or few sentences of text that is too short to provide encyclopedic coverage of a subject.

Some researchers worked on increasing and enriching the content of Wikipedia, but most of these efforts focused on developing methods that process text in other languages rather than Arabic.

This research aims at boosting online Arabic content.

In particular, it aims to boost the editing process in Arabic Wikipedia.

Our main objective is to develop method for suggesting contents for Arabic Wikipedia articles either to enrich the contents of existing stub pages or to generate new ones that contain infobox.

The proposed methods build on existing methods in Information Retrieval, Question Answering, and Text Mining in order to extract key information from relevant documents on the web.

The automatically generated contents and the different resources from which these contents are extracted will be available for Wikipedia editors for revision and proofreading before adding them to Wikipedia.

In this research, we focus on enriching the Infobox which is a summary of some unifying parameters at the top left/right corner of an article.

We developed four main algorithms to extract) birth, death )locations, (birth, death) dates and full name of entity.

We have conducted many experiments to evaluate our methods on articles about named entities in the political domain.

Our results achieved an overall accuracy of 80.3%.

Main Subjects

Information Technology and Computer Science

No. of Pages

93

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Related work.

Chapter Three : Proposed methodology for constructing infobox.

Chapter Four : System technical implementation.

Chapter Five : Results and discussion.

Chapter Six : Conclusions and future work.

References.

American Psychological Association (APA)

Shublaq, Salim Muhammad Salim. (2016). Semi-automatic method for info boxes extraction for Arabic Wikipedia articles. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-735540

Modern Language Association (MLA)

Shublaq, Salim Muhammad Salim. Semi-automatic method for info boxes extraction for Arabic Wikipedia articles. (Master's theses Theses and Dissertations Master). Islamic University. (2016).
https://search.emarefa.net/detail/BIM-735540

American Medical Association (AMA)

Shublaq, Salim Muhammad Salim. (2016). Semi-automatic method for info boxes extraction for Arabic Wikipedia articles. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-735540

Language

English

Data Type

Arab Theses

Record ID

BIM-735540