Automatic pronunciation dictionary toolkit for Arabic speech recognition using SPHINX engine

مقدم أطروحة جامعية

Hiyasat, Husayn Abd al-Razzaq Radwan

مشرف أطروحة جامعية

Abu Zitar, Raid

أعضاء اللجنة

Kanan, Ghassan Jaddu
al-Yasin, Mustafa

الجامعة

الأكاديمية العربية للعلوم المالية و المصرفية

الكلية

كلية نظم و تكنولوجيا المعلومات

دولة الجامعة

الأردن

الدرجة العلمية

دكتوراه

تاريخ الدرجة العلمية

2007

الملخص الإنجليزي

Although the Arab world has an estimated number of 250 million Arabic speakers, despite this fact there has been little research on Arabic speech recognition compared to other languages of similar or less importance (e.g.

Mandarin).

Due to the lack of diacritic Arabic text and the lack of Pronunciation Dictionary (PD), most of previous work on Arabic Automatic Speech Recognition has been concentrated on developing recognizers using Romanized characters i.e.

let the system recognizes the Arabic word as an English one, then map it to Arabic word from lookup table that maps the Arabic word to its Romanized pronunciation.

In this thesis, we introduce the first SPHINX-IV-based Arabic recognizer and propose an automatic toolkit, which is capable of producing (PD) for both Holly Qura'an and standard Arabic language.

Three corpuses are completely developed in this thesis, namely the Holly Qura'an Corpus HQC-1 about 18.5 hours, the command and control corpus CAC-1 about 1.5 hours and Arabic digits corpus ADC less than one hour of speech.

The building process is completely described.

Fully diacritic Arabic transcriptions, for all the three corpuses were developed too.

SPHINX-IV engine was customized and trained, for both the language model and the lexicon modules shown in the frame work architecture block diagram below.

Using the three mentioned corpuses; the (PD) developed by our automatic tool with the transcripts, SPHINX-IV engine is trained and tuned in order to develop three acoustic models, one for each corpus.

Training is based on an HMM model that is built on statistical information and random Front-end Processing Search Algorithm Language Model Lexicon Acoustic Models Recognition Hypothesis Time 11 variables distributions extracted from the training data itself.

New algorithm is proposed to add unlabeled data to the training corpus in order to increase the corpus size.

This algorithm is based on Neural Network confidence scorer and then is used to annotate the decoded speech in order to decide whether the proposed transcript is accepted and can be added to the seed corpus or not.

The model parameters were fine-tuned using simulated annealing algorithm; optimum values were tested and reported.

Our major contribution is mainly using the open source SPHINX–IV model in Arabic speech recognition by building our own language and acoustic models without Romanization for the Arabic speech.

The system is fine-tuned and data are refined for training and validation.

Optimum values for number of Gaussian mixtures distributions and number of states in HMM's have been found according to specified performance measures.

Optimum values for confidence scores were found for the training data.

Although much more work need to be done to complete the work with this size, we consider the corpus used in our system is enough to validate our approach.

SPHINX has never been used before in this manner for Arabic speech recognition.

The thesis is an invitation for all open source speech recognition developers and groups to take over and capitalize on what we have started.

التخصصات الرئيسية

اللغات والآداب المقارنة
تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

142

قائمة المحتويات

Table of contents.

Abstract.

Chapter one : Thesis introduction.

Chapter two : A general review of speech recognition issues.

Chapter three : Automatic speech recognition background information.

Chapter four : Arabic speech sounds and properties.

Chapter five : Grapheme based pronunciation dictionary for Arabic.

Chapter six : Experimental environment.

Chapter Seven : SPHINX-IV parameters tuning.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Hiyasat, Husayn Abd al-Razzaq Radwan. (2007). Automatic pronunciation dictionary toolkit for Arabic speech recognition using SPHINX engine. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-304907

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Hiyasat, Husayn Abd al-Razzaq Radwan. Automatic pronunciation dictionary toolkit for Arabic speech recognition using SPHINX engine. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences. (2007).
https://search.emarefa.net/detail/BIM-304907

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Hiyasat, Husayn Abd al-Razzaq Radwan. (2007). Automatic pronunciation dictionary toolkit for Arabic speech recognition using SPHINX engine. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-304907

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-304907