Text entailment-based Arabic text segmentation and summarization system Al al-Bayt University

مقدم أطروحة جامعية

al-Khawlidah, Fatimah Taha

مشرف أطروحة جامعية

Samawi, Venus W.

أعضاء اللجنة

al-Nihoud, Jihad Quball Awdah
Ababinah, Ismail M.
Ubayd, Nadim Ali Miri

الجامعة

جامعة آل البيت

الكلية

كلية الأمير الحسين بن عبد الله لتكنولوجيا المعلومات

القسم الأكاديمي

قسم علوم الحاسوب

دولة الجامعة

الأردن

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2012

الملخص الإنجليزي

Text summarization is the process of creating a short description of a specified text while preserving its information context.

Text summarization helps readers to determine significant information of the huge body of the available information that can be found using search engines.

In this research, we try to remove the semantic redundancy and insignificance from the summarized text.

This will reduce the summary and save the reader's time and efforts.

This can be achieved by checking the text entailment relation, and lexical cohesion.

This research concerned with developing Arabic text summarization approach based on lexical cohesion and text entailment relation using the extraction technique.

The developed approach is a single document summarization type.

A measure of lexical cohesion (semantically related words) can be used to detect and remove the unimportant information in order to improve the quality of the Summary.

Text entailment is a method for matching two texts in order to check if the statement of one text is logically inferred by another. The developed approach mainly consists of four phases.

The first phase is the preprocessing phase, which includes removing the stop words, and extracting the stem of each word.

In the second phase, each word is assigned with the correct sehse (meaning) based on the context of the text.

In phase three, the text is divided into segments (according to topics) using the lexical chains which are extracted from the text.

The most important sentences are extracted from the most important segments.

In phase four, the text entailment relation is applied.

It measures how much vocabularies overlap between one sentence and other sentences.

It is computed by cosine similarity measure (between 0 and 1).

Arabic Word Net is the knowledge source used for identifying semantic relationships between words in phase two, three, and four. To evaluate the suggested approach, its performance is compared with previous Arabic text summarization systems.

Each system output is compared against Essex Arabic Summaries Corpus (EASC) corpus (the model summaries), using Recall-Oriented Understudy for Gisting Evaluation (ROUGE) and Automatic Summarization Engineering (Auto Summing) metrics.

Five evaluation scales are used to compare the proposed system with the human judge.

Theses scales are stated in Document Understanding Conference (DUC) 2005 as Very Poor, Poor, Fair, Good, or Very Good. The outcome of the proposed system indicates that the performance of the developed approach gives improved results compared with previous Arabic text summarization systems.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

79

قائمة المحتويات

Table of contents.

Abstract.

Chapter One : introduction.

Chapter Two : theoretical concepts.

Chapter Three : the methodology.

Chapter Four : experiments and results evaluation.

Chapter Five : conclusion and future works.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Khawlidah, Fatimah Taha. (2012). Text entailment-based Arabic text segmentation and summarization system Al al-Bayt University. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321627

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Khawlidah, Fatimah Taha. Text entailment-based Arabic text segmentation and summarization system Al al-Bayt University. (Master's theses Theses and Dissertations Master). Al albayt University. (2012).
https://search.emarefa.net/detail/BIM-321627

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Khawlidah, Fatimah Taha. (2012). Text entailment-based Arabic text segmentation and summarization system Al al-Bayt University. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321627

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-321627