Improving the accuracy of English-Arabic statistical sentence alignment

Joint Authors

Salameh, Muhammad
Zantout, Rashid
Mansur, Nashat

Source

The International Arab Journal of Information Technology

Issue

Vol. 8, Issue 2 (30 Apr. 2011), pp.171-177, 7 p.

Publisher

Zarqa University

Publication Date

2011-04-30

Country of Publication

Jordan

No. of Pages

7

Main Subjects

Information Technology and Computer Science

Abstract EN

Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output.

Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models.

Several systems have been devised that automatically align words of a pair of sentences, each in a language.

Such systems have been used successfully with European languages.

In this paper, one such system is used to align sentences in an English-Arabic corpus.

The system works poorly given raw unaligned sentence English-Arabic sentence pairs.

This prompted the development of a preprocessing step to be applied to the Arabic sentences.

The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences.

American Psychological Association (APA)

Salameh, Muhammad& Zantout, Rashid& Mansur, Nashat. 2011. Improving the accuracy of English-Arabic statistical sentence alignment. The International Arab Journal of Information Technology،Vol. 8, no. 2, pp.171-177.
https://search.emarefa.net/detail/BIM-249568

Modern Language Association (MLA)

Salameh, Muhammad…[et al.]. Improving the accuracy of English-Arabic statistical sentence alignment. The International Arab Journal of Information Technology Vol. 8, no. 2 (Apr. 2011), pp.171-177.
https://search.emarefa.net/detail/BIM-249568

American Medical Association (AMA)

Salameh, Muhammad& Zantout, Rashid& Mansur, Nashat. Improving the accuracy of English-Arabic statistical sentence alignment. The International Arab Journal of Information Technology. 2011. Vol. 8, no. 2, pp.171-177.
https://search.emarefa.net/detail/BIM-249568

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 176-177

Record ID

BIM-249568