Resolving cross-language problem in Arabic proper name transliteration
Author
Source
International Journal of Intelligent Computing and Information Sciences
Issue
Vol. 6, Issue 2 (31 Jul. 2006)9 p.
Publisher
Ain Shams University Faculty of Computer and Information Sciences
Publication Date
2006-07-31
Country of Publication
Egypt
No. of Pages
9
Main Subjects
Information Technology and Computer Science
Topics
Abstract EN
The aim of this study is to develop a machine learning algorithm that clusters multiple versions or scripts of Arabic proper names.
The proposed model will be useful in unique subjects' identification, machine translation, cross-information retrieval and speech–synthesis areas.
Proper names are represented into a binary matrix that reserves the order of the alphabet.
Then, two stages of preprocessing are applied.
The first preprocessing step unifies the data in one form and the second step unifies the name length.
K-mean algorithm is used for clustering along with a simple supervised clustering algorithm based on the Euclidian distance.
725 unique Arabic names transliterated into English are used that sums up to 1340 along with their versions.
Five experiments have been conducted that differ in the percentage of the training set that are 50 %, 60 %, 70 %, 80 % and 90 %.
For each experiment, the training samples are randomly selected with five repeats and the average results are reported.
The model is tested with two sets ; the remaining samples from the first set and a new set that constitutes of 1000 names.
For both clustering algorithms, 56 % and 75 % correct clustering are obtained for testing set #1 and set #2 at 70 % and 80 % respectively.
Both clustering algorithms have 64 % as the average performance on both testing sets when 70 % of data is used for training.
American Psychological Association (APA)
Hana, M. A.. 2006. Resolving cross-language problem in Arabic proper name transliteration. International Journal of Intelligent Computing and Information Sciences،Vol. 6, no. 2.
https://search.emarefa.net/detail/BIM-284211
Modern Language Association (MLA)
Hana, M. A.. Resolving cross-language problem in Arabic proper name transliteration. International Journal of Intelligent Computing and Information Sciences Vol. 6, no. 2 (Jul. 2006).
https://search.emarefa.net/detail/BIM-284211
American Medical Association (AMA)
Hana, M. A.. Resolving cross-language problem in Arabic proper name transliteration. International Journal of Intelligent Computing and Information Sciences. 2006. Vol. 6, no. 2.
https://search.emarefa.net/detail/BIM-284211
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references.
Record ID
BIM-284211