Automatic topic classification system of spoken Arabic news

Other Title(s)

النظام الآلي للتصنيف الموضوعي للأخبار المنطوقة باللغة العربية

Dissertant

Abu Sulayman, Nasir Sadiq Abd Allah

Thesis advisor

al-Hanjuri, Muhammad Ahmad Muhammad

University

Islamic University

Faculty

Faculty of Engineering

Department

Department of Computer Engineering

University Country

Palestine (Gaza Strip)

Degree

Master

Degree Date

2017

English Abstract

One of the most important consequences of what is known as the "Internet era" is the widespread of varied electronic data.

This deployment urgently requires an automated system to classify these data to facilitate search and access to the topic in question.

This system is commonly used in written texts.

Because of the huge increase of spoken files nowadays, there is an acute need for building an automatic system to classify spoken files based on topics.

This system has been discussed in the previous researches applied to spoken English texts, but it rarely takes into consideration spoken Arabic texts because Arabic language is challenging and its dataset is rare.

To deal with this challenge, a new dataset is established depending on converting the common written text (ALJAZEERA-NEWS) which is widely used in researches in classifying written texts.

Then, keywords extraction method is implemented in order to extract the keywords representing each class depending on using dynamic time warping.

Finally, topic identification, based on (Mel-frequency Cepstral Coefficients and Relative Spectral Transform - Perceptual Linear Prediction) as speech features and (Dynamic Time Warping and Hidden Markov Models) as classifiers, is created using a technique that is different from the traditional way, using an automatic speech recognition to extract the transcriptions. Segmentation method is proposed to deal with the segmentation of spoken files into words.

Regarding the evaluation of the system, accuracy, F1-measure, precision and recall are used as evaluation metrics.

The proposed system shows positive results in the topic classification field.

The F1-measure metric for topic identification system using dynamic time warping classifier records 90.26% and 91.36% using hidden Markov models classifier in the average.

In addition, the system achieves 89.65% of keywords identification accuracy

Main Subjects

Information Technology and Computer Science

No. of Pages

96

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Related works.

Chapter Three : Background theory.

Chapter Four : Proposed work.

Chapter Five : Results and discussion.

Chapter Six : Conclusions and recommendations.

References.

American Psychological Association (APA)

Abu Sulayman, Nasir Sadiq Abd Allah. (2017). Automatic topic classification system of spoken Arabic news. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-905179

Modern Language Association (MLA)

Abu Sulayman, Nasir Sadiq Abd Allah. Automatic topic classification system of spoken Arabic news. (Master's theses Theses and Dissertations Master). Islamic University. (2017).
https://search.emarefa.net/detail/BIM-905179

American Medical Association (AMA)

Abu Sulayman, Nasir Sadiq Abd Allah. (2017). Automatic topic classification system of spoken Arabic news. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-905179

Language

English

Data Type

Arab Theses

Record ID

BIM-905179