Automatic Arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce

Other Title(s)

التلخيص التلقائي للنصوص العربية المتعددة كبيرة الحجم باستخدام الخوارزمية الجينية و MapReduce

Dissertant

al-Brim, Sulayman Nasr Allah Sulayman

Thesis advisor

Barakah, Ribhi Sulayman

Comitee Members

Mahmud, Ahmad Yahya
Maghari, Ashraf Yunus

University

Islamic University

Faculty

Faculty of Information Technology

Department

Information Technology

University Country

Palestine (Gaza Strip)

Degree

Master

Degree Date

2016

English Abstract

Automatic Text summarization is one of the most important problems in the area of text mining and information retrieval.

The importance of automatic text summarization comes from its ability to provide the most significant information from a large text by reducing the size of textual documents.

Multi document summarization focus in extracting the most significant information from a collection of textual documents.

Most summarization techniques require the data to be centralized, which may not be feasible in many cases due to computational and storage limitations.

The huge increasing of data emerging by the progress of technology and the various sources of makes automatic text summarization of large scale of data a challenging task.

We propose an approach for automatic text summarization of large scale Arabic multiple documents using Genetic algorithm based on open source MapReduce model, MapReduce is a powerful parallel programming model.

We make our approach insuring scalability, speed and accuracy in summary generation and try to eliminating redundancy for sentences and increasing the readability and cohesion factors between the sentences of summaries.

We evaluate the proposed method using several automatic summarization quality measures in terms of Recall, Precision, F-measure.

In addition to that we evaluate the parallel computation environment in terms of speed up, efficiency and scalability.

The experiments resulted in high precision and recall scores.

This indicates that the system successfully identifies the most important sentences.

In addition to that, the proposed approach provides up to 10x speedup score, which is faster than executing the same code on single machine.

Therefore, it can deal with large-scale datasets successfully.

Finally, the efficiency score of the proposed approach indicates that the largest data set utilize the available resources up 62% which is a satisfying result taking into account the available data set sizes.

Main Subjects

Information Technology and Computer Science

No. of Pages

81

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Theoretical and technical foundation.

Chapter Three : Related works.

Chapter Four : Multi document summarization system design.

Chapter Five : Implementation and experiments.

Chapter Six : Evaluation.

Chapter Seven : Conclusion and future work.

References.

American Psychological Association (APA)

al-Brim, Sulayman Nasr Allah Sulayman. (2016). Automatic Arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-727246

Modern Language Association (MLA)

al-Brim, Sulayman Nasr Allah Sulayman. Automatic Arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce. (Master's theses Theses and Dissertations Master). Islamic University. (2016).
https://search.emarefa.net/detail/BIM-727246

American Medical Association (AMA)

al-Brim, Sulayman Nasr Allah Sulayman. (2016). Automatic Arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-727246

Language

English

Data Type

Arab Theses

Record ID

BIM-727246