On the entropy of Arabic
Author
Source
The Arabian Journal for Science and Engineering
Issue
Vol. 16, Issue 4(s) (31 Oct. 1991), pp.557-563, 7 p.
Publisher
King Fahd University of Petroleum and Minerals
Publication Date
1991-10-31
Country of Publication
Saudi Arabia
No. of Pages
7
Main Subjects
Information Technology and Computer Science
Topics
Abstract EN
The entropy of a language is a parameter with important applications ١١؛ cryptography, coding, and natural language processing.
Entropies for languages such as English have been estimated to reasonable levels of accuracy due to the availability of extensive statistics for the letter and word frequencies of these languages.
Similar statistics for Arabic became available only recently and have since been used to estimate the entropy of Arabic.
This paper reviews two approaches for estimating the entropy of Arabic, and presents a new one.
The first approach uses the letter frequencies for estimating the entropy, the second gives estimate of the entropy based on word frequencies and roots count.
The new approach gives estimates of the entropy based on a proposed method to upperbound the number of meaningful Arabic sentences of length n letters.
The new method results in estimates that are close to those given by higher order entropies of Arabic as computed by some researchers, but with much less computational effort.
American Psychological Association (APA)
al-Suwayl, M. I.. 1991. On the entropy of Arabic. The Arabian Journal for Science and Engineering،Vol. 16, no. 4(s), pp.557-563.
https://search.emarefa.net/detail/BIM-598609
Modern Language Association (MLA)
al-Suwayl, M. I.. On the entropy of Arabic. The Arabian Journal for Science and Engineering Vol. 16, no. 4 (Oct. 1991), pp.557-563.
https://search.emarefa.net/detail/BIM-598609
American Medical Association (AMA)
al-Suwayl, M. I.. On the entropy of Arabic. The Arabian Journal for Science and Engineering. 1991. Vol. 16, no. 4(s), pp.557-563.
https://search.emarefa.net/detail/BIM-598609
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references : p. 563
Record ID
BIM-598609