An approach for extracting Arabic word root based on voice writing forms

مقدم أطروحة جامعية

Abu Sini, Ahmad Abd al-Qadir

مشرف أطروحة جامعية

Kanan, Ghassan Jaddu

أعضاء اللجنة

al-Shaykh, Asim A. R.
Kanan, Raid Karim
al-Qarini, Shihadah

الجامعة

الأكاديمية العربية للعلوم المالية و المصرفية

الكلية

كلية نظم و تكنولوجيا المعلومات

القسم الأكاديمي

قسم نظم المعلومات الحاسوبية

دولة الجامعة

الأردن

الدرجة العلمية

دكتوراه

تاريخ الدرجة العلمية

2011

الملخص الإنجليزي

Arabic language is a major international language, spoken in more than 23 countries, and the lingua franca of the Islamic world.

The number of Arabic language-speaking Internet users has grown rapidly, however, unfortunately, efforts to improve Arabic language information search compared to other languages are limited and modest.

The barrier to text processing advancements in Arabic language is the complicated morphological structure of Arabic language, so Arabic language root extracting has not advanced as in other languages such as English. Stemming is the process of reducing words to their roots or stems.

In highly infected languages such as Arabic language, stemming is considered one of the most significant factors to improve language processing effectiveness, unfortunately most current stemmers remove affixes and diacritical marks neglecting the importance of diacritical marks which appear either above or below the letters and play an essential role in many cases when it comes to distinguish semantically, and phonetically, between two identical words with the same characters, especially in cases of vowels, ablaut and slurring, because of that, a lot of errors occur when extracting the root of vowels and as well as in the case of text categorization and text translation, when the words have the same letters, but the meaning is different.

In our opinion, one of the main reasons for Arabic language processing ambiguity is neglecting the diacritical marks. Motivated by the need to enhance Arabic language searching, we developed a system for Arabic language Voice Writing (AVW) that will help to deal with Arabic language diacritical scripts and convert it to Arabic language speech symbols " Voice writing Forms ", the smallest units of Arabic language which depends on word pronunciation, this specific units are used to measure the internal relationship between the morphological structure of Arabic language.

We suggest using these " Voice writing " symbols to extract Arabic language word root, and also to enhance Arabic language text categorization and text translation, and to help Arabic language learners read and understand relationship between words, so we expect it will solve many of the Arabic language ambiguity arising from neglecting diacritical marks. We did our root extracting experiments, using three different accuracy measurements, first comparative with three main stemming algorithms after running them on a same sample document and roots extracted have been checked manually by Arabic language scholar, in this case Arabic voice system gives a significant with accuracy rate 88.14 %, second correctness measure where we have implemented our system on a selected Arabic language document containing 4000 Arabic language diacritical words and the roots extracted have been checked manually by Arabic language scholar, accuracy rate in this case was 90.45 %, third diacritical marks measure where we have implemented our system on a selected Arabic language document containing 1000 Arabic language words in two runs, the first contained diacritical words, accuracy rate in this case was 90.7 %, and second contain bare words (without diacritics) accuracy rate in this case was 70.8 %,.

The results indicate that our approach will be efficient in dealing with diacritical text, since, it will be more accurate and precise. Then we applied our system on Arabic language text Categorization and Arabic language text translation.

Then manual mathematical experiments have been implemented and results indicate that using (AVW) will enhance these processes.

The main ideas in this research have been disseminated in international journals and highly refereed conference proceedings.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

104

قائمة المحتويات

Table of contents.

Abstract.

Chapter one : Introduction.

Chapter two : Literature review.

Chapter three : Arabic language.

Chapter four : Arabic language voice writing stemmer (AVWS).

Chapter five : (AVW) role in text categorization and text translation.

Chapter six : Conclusion and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Abu Sini, Ahmad Abd al-Qadir. (2011). An approach for extracting Arabic word root based on voice writing forms. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-306696

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Abu Sini, Ahmad Abd al-Qadir. An approach for extracting Arabic word root based on voice writing forms. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences. (2011).
https://search.emarefa.net/detail/BIM-306696

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Abu Sini, Ahmad Abd al-Qadir. (2011). An approach for extracting Arabic word root based on voice writing forms. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-306696

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-306696