Automatic Arabic text summarization

Other Title(s)

تلخيص تلقائي للنصوص العربية

Dissertant

al-Farra, Iyad Jihad Rafiq

Thesis advisor

Makki, Muhammad Amin

Comitee Members

al-Hulays, Ala Mustafa
Abu Nasir, Sami Salim

University

Islamic University

Faculty

Faculty of Engineering

Department

Department of Computer Engineering

University Country

Palestine (Gaza Strip)

Degree

Master

Degree Date

2015

English Abstract

Arabic language is one of the most famous languages in the world; its importance comes from being the fifth language that has native speakers in the world.

Creating a good summary of the text is one of the most important things in the linguistics because it gives the user the most important paragraphs in the text that he wants to read.

There are some techniques to summarize Arabic language, but they are still little and need to be improved.

One approach that are used in text summarization is graph based but it still need enhacment.

This thesis builds a new algorithm called GBATSS (Graph Based Arabic Text Summarizer) to summarize Arabic text depending on NLP and Google page rank algorithm.

The system works on three basic units.

These units are rooted stem, light stem, and finally no-stem.

The system depends on compression ratio of 40 %.

The process of summarization is done in 12 stages start from data collection, text preprocessing, text normalization, text tokenization, stemming, stop words removal, building graph, calculating edge weighting, applying page rank, and finally extracting the summary.

Finally, we tested the system using EASC data set and using the recall, precision and f-measure for evaluation process.

The results show that the using of rooted-stem as a basic unit gives the best results then no-stem and finally light-stem

Main Subjects

Information Technology and Computer Science

No. of Pages

108

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Related work.

Chapter Three : Theoretical background.

Chapter Four : Methodology.

Chapter Five : Results, evaluation and discussion.

Chapter Six : Conclusion and future work.

References.

American Psychological Association (APA)

al-Farra, Iyad Jihad Rafiq. (2015). Automatic Arabic text summarization. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-727106

Modern Language Association (MLA)

al-Farra, Iyad Jihad Rafiq. Automatic Arabic text summarization. (Master's theses Theses and Dissertations Master). Islamic University. (2015).
https://search.emarefa.net/detail/BIM-727106

American Medical Association (AMA)

al-Farra, Iyad Jihad Rafiq. (2015). Automatic Arabic text summarization. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-727106

Language

English

Data Type

Arab Theses

Record ID

BIM-727106