Statistical Arabic grammar analyzer based on rules mining approach using Naïve Bayesian algorithm

العناوين الأخرى

محلل نحوي عربي إحصائي مبني على منهج التنقيب عن القواعد باستخدام خوارزمية الناييف بييزين

مقدم أطروحة جامعية

al-Faris, Ahmad Wasif

مشرف أطروحة جامعية

Abu Shurayhah, Ahmad Adil

أعضاء اللجنة

al-Shruf, Fayiz
Hammu, Bassam

الجامعة

جامعة الشرق الأوسط

الكلية

كلية تكنولوجيا المعلومات

القسم الأكاديمي

قسم علم الحاسوب

دولة الجامعة

الأردن

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2017

الملخص الإنجليزي

Arabic sentences have always been a challenge because they, mostly, may carry more than one meaning.

What determines the desired meaning is grammar analysis.

Grammar analysis is the process of determining the grammatical tag, grammatical case and grammatical diacritic (at the last character in the word) of each word in an Arabic sentence.

There are two approaches to deal with grammar analysis for arabic language which are: rule–based approach and statistical approach.

However, rule-based approach suffers from various drawbacks, such as the limitation of its capabilities in dealing with short sentences only, required much hard-to-get language knowledge/resources and time consumption.

Additionally, the free word order nature of Arabic sentences from one hand and the presence of an elliptic personal pronoun from other hand increase the difficulty not only for rule-based approach, but also for building an efficient context free grammar (CFG).

In this thesis, an approach has been suggested to automate Arabic grammar analysis attempting to overcome the problems and setbacks that emerged in using the rule-based approach.

The proposed approach consists of four stages: inputs stage, features extraction and building structured data stage, the learning stage and the discovery stage.

In the First stage, each word in a sentence is annotated with its corresponding grammar analysis manually.

In the second stage, a 14 features were extracted for each word in sentences of the corpus.

In the third stage, which called the learning stage, the annotated corpus of sentences is entered to the system which subjected to the classifier of the Naive Bayes algorithm model was constructed.

In the fourth stage, which called the discovery stage, a non-annotated corpus of sentences subjected to features extraction process in the second stage and using the constructed model resulted in the third stage, to choose the most correct grammar category.

Some of features used are: state, voice, aspect, mood, case, part-of-speech (POS).

Although, there are some limitations (e.g.: the limited length of the utilized sentences, limited set of utilized features, not all words can be rooted clearly), the results were satisfactory with adequate accuracy of 75.38 % for 7204 sentences.

In conclusion, the proposed method is an attempt to resolve the ambiguity of Arabic sentences by making grammar analysis an easier process

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

98

قائمة المحتويات

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Litrature review and related works.

Chapter Three : Proposed work.

Chapter Four : The experimental results.

Chapter Five : Conclusion and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Faris, Ahmad Wasif. (2017). Statistical Arabic grammar analyzer based on rules mining approach using Naïve Bayesian algorithm. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-762657

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Faris, Ahmad Wasif. Statistical Arabic grammar analyzer based on rules mining approach using Naïve Bayesian algorithm. (Master's theses Theses and Dissertations Master). Middle East University. (2017).
https://search.emarefa.net/detail/BIM-762657

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Faris, Ahmad Wasif. (2017). Statistical Arabic grammar analyzer based on rules mining approach using Naïve Bayesian algorithm. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-762657

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-762657