Semi-automatic method for info boxes extraction for Arabic Wikipedia articles

العناوين الأخرى

طريقة شبه آلية لاستخلاص معلومات مختصرة لمقالات ويكيبيديا العربية

مقدم أطروحة جامعية

Shublaq, Salim Muhammad Salim

مشرف أطروحة جامعية

Awad Allah, Riwayah Fawzi

أعضاء اللجنة

Abu-Shaban, Yusuf Nabil
al-Halis, Ala Mustafa

الجامعة

الجامعة الإسلامية

الكلية

كلية تكنولوجيا المعلومات

القسم الأكاديمي

تكنولوجيا المعلومات

دولة الجامعة

فلسطين (قطاع غزة)

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2016

الملخص الإنجليزي

Arabic Language is one of the popular languages over the world.

There is 5% of people over the world speak Arabic.

However, it suffers from a low percentage of content over the internet.

Wikipedia is a very well-known multilingual, web-based, free-content encyclopedia project supported by the Wikimedia Foundation and based on a model of openly editable content.

It is one of the greatest repositories of human knowledge ever constructed, and has high ranks in Google that makes its pages often pop up in search results.

Arabic Wikipedia, which is part of Wikipedia website, lacks valuable content compared to Wikipedia content for other languages.

Besides, many of existing articles are stub pages containing only one or few sentences of text that is too short to provide encyclopedic coverage of a subject.

Some researchers worked on increasing and enriching the content of Wikipedia, but most of these efforts focused on developing methods that process text in other languages rather than Arabic.

This research aims at boosting online Arabic content.

In particular, it aims to boost the editing process in Arabic Wikipedia.

Our main objective is to develop method for suggesting contents for Arabic Wikipedia articles either to enrich the contents of existing stub pages or to generate new ones that contain infobox.

The proposed methods build on existing methods in Information Retrieval, Question Answering, and Text Mining in order to extract key information from relevant documents on the web.

The automatically generated contents and the different resources from which these contents are extracted will be available for Wikipedia editors for revision and proofreading before adding them to Wikipedia.

In this research, we focus on enriching the Infobox which is a summary of some unifying parameters at the top left/right corner of an article.

We developed four main algorithms to extract) birth, death )locations, (birth, death) dates and full name of entity.

We have conducted many experiments to evaluate our methods on articles about named entities in the political domain.

Our results achieved an overall accuracy of 80.3%.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

عدد الصفحات

93

قائمة المحتويات

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Related work.

Chapter Three : Proposed methodology for constructing infobox.

Chapter Four : System technical implementation.

Chapter Five : Results and discussion.

Chapter Six : Conclusions and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Shublaq, Salim Muhammad Salim. (2016). Semi-automatic method for info boxes extraction for Arabic Wikipedia articles. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-735540

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Shublaq, Salim Muhammad Salim. Semi-automatic method for info boxes extraction for Arabic Wikipedia articles. (Master's theses Theses and Dissertations Master). Islamic University. (2016).
https://search.emarefa.net/detail/BIM-735540

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Shublaq, Salim Muhammad Salim. (2016). Semi-automatic method for info boxes extraction for Arabic Wikipedia articles. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-735540

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-735540