Text entailment-based Arabic text segmentation and summarization system Al al-Bayt University
Dissertant
Thesis advisor
Comitee Members
al-Nihoud, Jihad Quball Awdah
Ababinah, Ismail M.
Ubayd, Nadim Ali Miri
University
Al albayt University
Faculty
Prince Hussein Bin Abdullah Faculty for Information Technology
Department
Department of Computer Science
University Country
Jordan
Degree
Master
Degree Date
2012
English Abstract
Text summarization is the process of creating a short description of a specified text while preserving its information context.
Text summarization helps readers to determine significant information of the huge body of the available information that can be found using search engines.
In this research, we try to remove the semantic redundancy and insignificance from the summarized text.
This will reduce the summary and save the reader's time and efforts.
This can be achieved by checking the text entailment relation, and lexical cohesion.
This research concerned with developing Arabic text summarization approach based on lexical cohesion and text entailment relation using the extraction technique.
The developed approach is a single document summarization type.
A measure of lexical cohesion (semantically related words) can be used to detect and remove the unimportant information in order to improve the quality of the Summary.
Text entailment is a method for matching two texts in order to check if the statement of one text is logically inferred by another. The developed approach mainly consists of four phases.
The first phase is the preprocessing phase, which includes removing the stop words, and extracting the stem of each word.
In the second phase, each word is assigned with the correct sehse (meaning) based on the context of the text.
In phase three, the text is divided into segments (according to topics) using the lexical chains which are extracted from the text.
The most important sentences are extracted from the most important segments.
In phase four, the text entailment relation is applied.
It measures how much vocabularies overlap between one sentence and other sentences.
It is computed by cosine similarity measure (between 0 and 1).
Arabic Word Net is the knowledge source used for identifying semantic relationships between words in phase two, three, and four. To evaluate the suggested approach, its performance is compared with previous Arabic text summarization systems.
Each system output is compared against Essex Arabic Summaries Corpus (EASC) corpus (the model summaries), using Recall-Oriented Understudy for Gisting Evaluation (ROUGE) and Automatic Summarization Engineering (Auto Summing) metrics.
Five evaluation scales are used to compare the proposed system with the human judge.
Theses scales are stated in Document Understanding Conference (DUC) 2005 as Very Poor, Poor, Fair, Good, or Very Good. The outcome of the proposed system indicates that the performance of the developed approach gives improved results compared with previous Arabic text summarization systems.
Main Subjects
Information Technology and Computer Science
Topics
No. of Pages
79
Table of Contents
Table of contents.
Abstract.
Chapter One : introduction.
Chapter Two : theoretical concepts.
Chapter Three : the methodology.
Chapter Four : experiments and results evaluation.
Chapter Five : conclusion and future works.
References.
American Psychological Association (APA)
al-Khawlidah, Fatimah Taha. (2012). Text entailment-based Arabic text segmentation and summarization system Al al-Bayt University. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321627
Modern Language Association (MLA)
al-Khawlidah, Fatimah Taha. Text entailment-based Arabic text segmentation and summarization system Al al-Bayt University. (Master's theses Theses and Dissertations Master). Al albayt University. (2012).
https://search.emarefa.net/detail/BIM-321627
American Medical Association (AMA)
al-Khawlidah, Fatimah Taha. (2012). Text entailment-based Arabic text segmentation and summarization system Al al-Bayt University. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321627
Language
English
Data Type
Arab Theses
Record ID
BIM-321627