Automatic Arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce
Other Title(s)
التلخيص التلقائي للنصوص العربية المتعددة كبيرة الحجم باستخدام الخوارزمية الجينية و MapReduce
Dissertant
al-Brim, Sulayman Nasr Allah Sulayman
Thesis advisor
Comitee Members
Mahmud, Ahmad Yahya
Maghari, Ashraf Yunus
University
Islamic University
Faculty
Faculty of Information Technology
Department
Information Technology
University Country
Palestine (Gaza Strip)
Degree
Master
Degree Date
2016
English Abstract
Automatic Text summarization is one of the most important problems in the area of text mining and information retrieval.
The importance of automatic text summarization comes from its ability to provide the most significant information from a large text by reducing the size of textual documents.
Multi document summarization focus in extracting the most significant information from a collection of textual documents.
Most summarization techniques require the data to be centralized, which may not be feasible in many cases due to computational and storage limitations.
The huge increasing of data emerging by the progress of technology and the various sources of makes automatic text summarization of large scale of data a challenging task.
We propose an approach for automatic text summarization of large scale Arabic multiple documents using Genetic algorithm based on open source MapReduce model, MapReduce is a powerful parallel programming model.
We make our approach insuring scalability, speed and accuracy in summary generation and try to eliminating redundancy for sentences and increasing the readability and cohesion factors between the sentences of summaries.
We evaluate the proposed method using several automatic summarization quality measures in terms of Recall, Precision, F-measure.
In addition to that we evaluate the parallel computation environment in terms of speed up, efficiency and scalability.
The experiments resulted in high precision and recall scores.
This indicates that the system successfully identifies the most important sentences.
In addition to that, the proposed approach provides up to 10x speedup score, which is faster than executing the same code on single machine.
Therefore, it can deal with large-scale datasets successfully.
Finally, the efficiency score of the proposed approach indicates that the largest data set utilize the available resources up 62% which is a satisfying result taking into account the available data set sizes.
Main Subjects
Information Technology and Computer Science
No. of Pages
81
Table of Contents
Table of contents.
Abstract.
Abstract in Arabic.
Chapter One : Introduction.
Chapter Two : Theoretical and technical foundation.
Chapter Three : Related works.
Chapter Four : Multi document summarization system design.
Chapter Five : Implementation and experiments.
Chapter Six : Evaluation.
Chapter Seven : Conclusion and future work.
References.
American Psychological Association (APA)
al-Brim, Sulayman Nasr Allah Sulayman. (2016). Automatic Arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-727246
Modern Language Association (MLA)
al-Brim, Sulayman Nasr Allah Sulayman. Automatic Arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce. (Master's theses Theses and Dissertations Master). Islamic University. (2016).
https://search.emarefa.net/detail/BIM-727246
American Medical Association (AMA)
al-Brim, Sulayman Nasr Allah Sulayman. (2016). Automatic Arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-727246
Language
English
Data Type
Arab Theses
Record ID
BIM-727246