Resolving cross-language problem in Arabic proper name transliteration

Author

Hana, M. A.

Source

International Journal of Intelligent Computing and Information Sciences

Issue

Vol. 6, Issue 2 (31 Jul. 2006)9 p.

Publisher

Ain Shams University Faculty of Computer and Information Sciences

Publication Date

2006-07-31

Country of Publication

Egypt

No. of Pages

9

Main Subjects

Information Technology and Computer Science

Topics

Abstract EN

The aim of this study is to develop a machine learning algorithm that clusters multiple versions or scripts of Arabic proper names.

The proposed model will be useful in unique subjects' identification, machine translation, cross-information retrieval and speech–synthesis areas.

Proper names are represented into a binary matrix that reserves the order of the alphabet.

Then, two stages of preprocessing are applied.

The first preprocessing step unifies the data in one form and the second step unifies the name length.

K-mean algorithm is used for clustering along with a simple supervised clustering algorithm based on the Euclidian distance.

725 unique Arabic names transliterated into English are used that sums up to 1340 along with their versions.

Five experiments have been conducted that differ in the percentage of the training set that are 50 %, 60 %, 70 %, 80 % and 90 %.

For each experiment, the training samples are randomly selected with five repeats and the average results are reported.

The model is tested with two sets ; the remaining samples from the first set and a new set that constitutes of 1000 names.

For both clustering algorithms, 56 % and 75 % correct clustering are obtained for testing set #1 and set #2 at 70 % and 80 % respectively.

Both clustering algorithms have 64 % as the average performance on both testing sets when 70 % of data is used for training.

American Psychological Association (APA)

Hana, M. A.. 2006. Resolving cross-language problem in Arabic proper name transliteration. International Journal of Intelligent Computing and Information Sciences،Vol. 6, no. 2.
https://search.emarefa.net/detail/BIM-284211

Modern Language Association (MLA)

Hana, M. A.. Resolving cross-language problem in Arabic proper name transliteration. International Journal of Intelligent Computing and Information Sciences Vol. 6, no. 2 (Jul. 2006).
https://search.emarefa.net/detail/BIM-284211

American Medical Association (AMA)

Hana, M. A.. Resolving cross-language problem in Arabic proper name transliteration. International Journal of Intelligent Computing and Information Sciences. 2006. Vol. 6, no. 2.
https://search.emarefa.net/detail/BIM-284211

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references.

Record ID

BIM-284211