A hybrid statistical and morphological Arabic language diacritizing system

العناوين الأخرى

نظام تشكيل اللغة العربية الهجين الإحصائي و الصرفي

مقدم أطروحة جامعية

Hattab, Abd Allah al-Mamun

مشرف أطروحة جامعية

Husayn, Abd al-Amir Khalaf

أعضاء اللجنة

Salit, Azzam
Naum, Riyad S.

الجامعة

جامعة الشرق الأوسط

الكلية

كلية تكنولوجيا المعلومات

القسم الأكاديمي

قسم علم الحاسوب

دولة الجامعة

الأردن

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2012

الملخص الإنجليزي

This thesis represents a hybrid Arabic diacritizing system.

The main objective of this thesis is to build a system to diacritize Arabic text automatically using statistical model and morph-syntactical model.

The first part of this system determines the most likely diacritics by choosing the full-form Arabic sub-sentence diacritization with the highest weight and probability estimation.

The second part of the system factorizes and tokenizes each Arabic word into its possible morpho-syntactical constituent pattern, prefix, suffix, stem and root.

After factorizing, the morpho-syntactical part selects the most likely diacritization sequence from different factorizations of the word.

Most of the previous works on diacritization depend on tools such as Hidden Markov Model Toolkit (HTK) and/or higher linguistic knowledge such as morphology and syntax only, while this system uses statistical machine translation algorithm and ELXIRFM morphological analyzer.

The accuracy rate of this hybrid system is higher than the rates of traditional studies with larger domain of Arabic words.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

عدد الصفحات

69

قائمة المحتويات

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Literature survey and related work.

Chapter Three : Arabic morpho-syntactical analysis.

Chapter Four : Statistical machine translation.

Chapter Five : Proposed model and methodology.

Chapter Six : Experiments results.

Chapter Seven : Conclusion and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Hattab, Abd Allah al-Mamun. (2012). A hybrid statistical and morphological Arabic language diacritizing system. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-693803

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Hattab, Abd Allah al-Mamun. A hybrid statistical and morphological Arabic language diacritizing system. (Master's theses Theses and Dissertations Master). Middle East University. (2012).
https://search.emarefa.net/detail/BIM-693803

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Hattab, Abd Allah al-Mamun. (2012). A hybrid statistical and morphological Arabic language diacritizing system. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-693803

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-693803