Sentences ordering approach for multi-document summarization in domain specific text document

Other Title(s)

تلخيص الوثائق المتعددة و ترتيب الجمل في مجال نصوص الوثائق

Dissertant

al-Nuaymi, Hamid Ali Husayn

Thesis advisor

al-Mashayikhi, Akram Uthman

Comitee Members

Kanan, Tariq
Kanan, Ghassan

University

Amman Arab University

Faculty

Collage of Computer Sciences and Informatics

Department

Department of Computer Science

University Country

Jordan

Degree

Master

Degree Date

2016

English Abstract

In this thesis, three approach techniques are presented to produce sentence ordering summarization involving a novel graph summarization.

The 1st approach we applied the normalized importance score (TF-IDF threshold (tf =0.0) of sentence to compute based on different semantic similarity measure and semantic features (with cosine -normal, 0.4- train) to choose sentences with the most representation in the document.

Stack decoder algorithm (with summary length=100, sentence length=6) was used as a model and builds on it to create the summaries nearest to original document.

The 2nd approach the sentences are clustering based on (K-means clustering) semantic similarity score and selection that represent from all cluster that is involved in the created summary.

The 3rd approach is a novel graph formulation (with threshold=0.5) where it is generated on cliques found in the organized graph.

Graph is created to build the edges among sentences that have similar topics but not similar as semantically.

Linear combination of feature value is used as our importance function.

By training on DUC2002 data we calculate the weight for the feature value and apply them to get the score of the important sentence in the test data.

We apply this approach to produce 100 word summaries of a dataset available as part of DUC 2004 and discus the development of the system, analysis and algorithm.

Rouge score is used for performance evaluation of the system.

Main Subjects

Information Technology and Computer Science

No. of Pages

95

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : General framework of the thesis.

Chapter Two : Related work.

Chapter Three : General framework of automatic summarization.

Chapter Four : Summarization methodology.

Chapter Five : Experiment and evaluation.

Chapter Six : Conclusion and future work.

References.

American Psychological Association (APA)

al-Nuaymi, Hamid Ali Husayn. (2016). Sentences ordering approach for multi-document summarization in domain specific text document. (Master's theses Theses and Dissertations Master). Amman Arab University, Jordan
https://search.emarefa.net/detail/BIM-722658

Modern Language Association (MLA)

al-Nuaymi, Hamid Ali Husayn. Sentences ordering approach for multi-document summarization in domain specific text document. (Master's theses Theses and Dissertations Master). Amman Arab University. (2016).
https://search.emarefa.net/detail/BIM-722658

American Medical Association (AMA)

al-Nuaymi, Hamid Ali Husayn. (2016). Sentences ordering approach for multi-document summarization in domain specific text document. (Master's theses Theses and Dissertations Master). Amman Arab University, Jordan
https://search.emarefa.net/detail/BIM-722658

Language

English

Data Type

Arab Theses

Record ID

BIM-722658