Automatic topic classification system of spoken Arabic news

العناوين الأخرى

النظام الآلي للتصنيف الموضوعي للأخبار المنطوقة باللغة العربية

مقدم أطروحة جامعية

Abu Sulayman, Nasir Sadiq Abd Allah

مشرف أطروحة جامعية

al-Hanjuri, Muhammad Ahmad Muhammad

الجامعة

الجامعة الإسلامية

الكلية

كلية الهندسة

القسم الأكاديمي

قسم هندسة الحاسوب

دولة الجامعة

فلسطين (قطاع غزة)

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2017

الملخص الإنجليزي

One of the most important consequences of what is known as the "Internet era" is the widespread of varied electronic data.

This deployment urgently requires an automated system to classify these data to facilitate search and access to the topic in question.

This system is commonly used in written texts.

Because of the huge increase of spoken files nowadays, there is an acute need for building an automatic system to classify spoken files based on topics.

This system has been discussed in the previous researches applied to spoken English texts, but it rarely takes into consideration spoken Arabic texts because Arabic language is challenging and its dataset is rare.

To deal with this challenge, a new dataset is established depending on converting the common written text (ALJAZEERA-NEWS) which is widely used in researches in classifying written texts.

Then, keywords extraction method is implemented in order to extract the keywords representing each class depending on using dynamic time warping.

Finally, topic identification, based on (Mel-frequency Cepstral Coefficients and Relative Spectral Transform - Perceptual Linear Prediction) as speech features and (Dynamic Time Warping and Hidden Markov Models) as classifiers, is created using a technique that is different from the traditional way, using an automatic speech recognition to extract the transcriptions. Segmentation method is proposed to deal with the segmentation of spoken files into words.

Regarding the evaluation of the system, accuracy, F1-measure, precision and recall are used as evaluation metrics.

The proposed system shows positive results in the topic classification field.

The F1-measure metric for topic identification system using dynamic time warping classifier records 90.26% and 91.36% using hidden Markov models classifier in the average.

In addition, the system achieves 89.65% of keywords identification accuracy

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

عدد الصفحات

96

قائمة المحتويات

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Related works.

Chapter Three : Background theory.

Chapter Four : Proposed work.

Chapter Five : Results and discussion.

Chapter Six : Conclusions and recommendations.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Abu Sulayman, Nasir Sadiq Abd Allah. (2017). Automatic topic classification system of spoken Arabic news. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-905179

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Abu Sulayman, Nasir Sadiq Abd Allah. Automatic topic classification system of spoken Arabic news. (Master's theses Theses and Dissertations Master). Islamic University. (2017).
https://search.emarefa.net/detail/BIM-905179

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Abu Sulayman, Nasir Sadiq Abd Allah. (2017). Automatic topic classification system of spoken Arabic news. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-905179

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-905179