Resolving cross-language problem in Arabic proper name transliteration

المؤلف

Hana, M. A.

المصدر

International Journal of Intelligent Computing and Information Sciences

العدد

المجلد 6، العدد 2 (31 يوليو/تموز 2006)9ص.

الناشر

جامعة عين شمس كلية الحاسبات و المعلومات

تاريخ النشر

2006-07-31

دولة النشر

مصر

عدد الصفحات

9

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

الملخص EN

The aim of this study is to develop a machine learning algorithm that clusters multiple versions or scripts of Arabic proper names.

The proposed model will be useful in unique subjects' identification, machine translation, cross-information retrieval and speech–synthesis areas.

Proper names are represented into a binary matrix that reserves the order of the alphabet.

Then, two stages of preprocessing are applied.

The first preprocessing step unifies the data in one form and the second step unifies the name length.

K-mean algorithm is used for clustering along with a simple supervised clustering algorithm based on the Euclidian distance.

725 unique Arabic names transliterated into English are used that sums up to 1340 along with their versions.

Five experiments have been conducted that differ in the percentage of the training set that are 50 %, 60 %, 70 %, 80 % and 90 %.

For each experiment, the training samples are randomly selected with five repeats and the average results are reported.

The model is tested with two sets ; the remaining samples from the first set and a new set that constitutes of 1000 names.

For both clustering algorithms, 56 % and 75 % correct clustering are obtained for testing set #1 and set #2 at 70 % and 80 % respectively.

Both clustering algorithms have 64 % as the average performance on both testing sets when 70 % of data is used for training.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Hana, M. A.. 2006. Resolving cross-language problem in Arabic proper name transliteration. International Journal of Intelligent Computing and Information Sciences،Vol. 6, no. 2.
https://search.emarefa.net/detail/BIM-284211

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Hana, M. A.. Resolving cross-language problem in Arabic proper name transliteration. International Journal of Intelligent Computing and Information Sciences Vol. 6, no. 2 (Jul. 2006).
https://search.emarefa.net/detail/BIM-284211

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Hana, M. A.. Resolving cross-language problem in Arabic proper name transliteration. International Journal of Intelligent Computing and Information Sciences. 2006. Vol. 6, no. 2.
https://search.emarefa.net/detail/BIM-284211

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references.

رقم السجل

BIM-284211