A hybrid statistical and morphological Arabic language diacritizing system

Other Title(s)

نظام تشكيل اللغة العربية الهجين الإحصائي و الصرفي

Dissertant

Hattab, Abd Allah al-Mamun

Thesis advisor

Husayn, Abd al-Amir Khalaf

Comitee Members

Salit, Azzam
Naum, Riyad S.

University

Middle East University

Faculty

Faculty of Information Technology

Department

Computer Science Department

University Country

Jordan

Degree

Master

Degree Date

2012

English Abstract

This thesis represents a hybrid Arabic diacritizing system.

The main objective of this thesis is to build a system to diacritize Arabic text automatically using statistical model and morph-syntactical model.

The first part of this system determines the most likely diacritics by choosing the full-form Arabic sub-sentence diacritization with the highest weight and probability estimation.

The second part of the system factorizes and tokenizes each Arabic word into its possible morpho-syntactical constituent pattern, prefix, suffix, stem and root.

After factorizing, the morpho-syntactical part selects the most likely diacritization sequence from different factorizations of the word.

Most of the previous works on diacritization depend on tools such as Hidden Markov Model Toolkit (HTK) and/or higher linguistic knowledge such as morphology and syntax only, while this system uses statistical machine translation algorithm and ELXIRFM morphological analyzer.

The accuracy rate of this hybrid system is higher than the rates of traditional studies with larger domain of Arabic words.

Main Subjects

Information Technology and Computer Science

No. of Pages

69

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Literature survey and related work.

Chapter Three : Arabic morpho-syntactical analysis.

Chapter Four : Statistical machine translation.

Chapter Five : Proposed model and methodology.

Chapter Six : Experiments results.

Chapter Seven : Conclusion and future work.

References.

American Psychological Association (APA)

Hattab, Abd Allah al-Mamun. (2012). A hybrid statistical and morphological Arabic language diacritizing system. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-693803

Modern Language Association (MLA)

Hattab, Abd Allah al-Mamun. A hybrid statistical and morphological Arabic language diacritizing system. (Master's theses Theses and Dissertations Master). Middle East University. (2012).
https://search.emarefa.net/detail/BIM-693803

American Medical Association (AMA)

Hattab, Abd Allah al-Mamun. (2012). A hybrid statistical and morphological Arabic language diacritizing system. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-693803

Language

English

Data Type

Arab Theses

Record ID

BIM-693803