Automatic Arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce

العناوين الأخرى

التلخيص التلقائي للنصوص العربية المتعددة كبيرة الحجم باستخدام الخوارزمية الجينية و MapReduce

مقدم أطروحة جامعية

al-Brim, Sulayman Nasr Allah Sulayman

مشرف أطروحة جامعية

Barakah, Ribhi Sulayman

أعضاء اللجنة

Mahmud, Ahmad Yahya
Maghari, Ashraf Yunus

الجامعة

الجامعة الإسلامية

الكلية

كلية تكنولوجيا المعلومات

القسم الأكاديمي

تكنولوجيا المعلومات

دولة الجامعة

فلسطين (قطاع غزة)

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2016

الملخص الإنجليزي

Automatic Text summarization is one of the most important problems in the area of text mining and information retrieval.

The importance of automatic text summarization comes from its ability to provide the most significant information from a large text by reducing the size of textual documents.

Multi document summarization focus in extracting the most significant information from a collection of textual documents.

Most summarization techniques require the data to be centralized, which may not be feasible in many cases due to computational and storage limitations.

The huge increasing of data emerging by the progress of technology and the various sources of makes automatic text summarization of large scale of data a challenging task.

We propose an approach for automatic text summarization of large scale Arabic multiple documents using Genetic algorithm based on open source MapReduce model, MapReduce is a powerful parallel programming model.

We make our approach insuring scalability, speed and accuracy in summary generation and try to eliminating redundancy for sentences and increasing the readability and cohesion factors between the sentences of summaries.

We evaluate the proposed method using several automatic summarization quality measures in terms of Recall, Precision, F-measure.

In addition to that we evaluate the parallel computation environment in terms of speed up, efficiency and scalability.

The experiments resulted in high precision and recall scores.

This indicates that the system successfully identifies the most important sentences.

In addition to that, the proposed approach provides up to 10x speedup score, which is faster than executing the same code on single machine.

Therefore, it can deal with large-scale datasets successfully.

Finally, the efficiency score of the proposed approach indicates that the largest data set utilize the available resources up 62% which is a satisfying result taking into account the available data set sizes.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

عدد الصفحات

81

قائمة المحتويات

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Theoretical and technical foundation.

Chapter Three : Related works.

Chapter Four : Multi document summarization system design.

Chapter Five : Implementation and experiments.

Chapter Six : Evaluation.

Chapter Seven : Conclusion and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Brim, Sulayman Nasr Allah Sulayman. (2016). Automatic Arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-727246

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Brim, Sulayman Nasr Allah Sulayman. Automatic Arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce. (Master's theses Theses and Dissertations Master). Islamic University. (2016).
https://search.emarefa.net/detail/BIM-727246

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Brim, Sulayman Nasr Allah Sulayman. (2016). Automatic Arabic text summarization for large scale multiple documents using genetic algorithm and MapReduce. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-727246

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-727246