Multilayer model for Arabic text compression

Author

Ujan, Arafat

Source

The International Arab Journal of Information Technology

Issue

Vol. 8, Issue 2 (30 Apr. 2011), pp.188-196, 9 p.

Publisher

Zarqa University

Publication Date

2011-04-30

Country of Publication

Jordan

No. of Pages

9

Main Subjects

Information Technology and Computer Science

Topics

Abstract EN

This article describes a multilayer model-based approach for text compression.

It uses linguistic information to develop a multilayer decomposition model of the text in order to achieve better compression.

This new approach is illustrated for the case of the Arabic language, where the majority of words are generated according to the Semitic root-and-pattern scheme.

Text is split into three linguistically homogeneous layers representing the three categories of words : derivative, non-derivative and functional words.

A fourth layer, called the Mask, is introduced to aid with the reconstruction of the original text from the three layers in the decoding side.

Suitable compression techniques are then applied to the different layers in order to maximize the compression ratio.

The proposed method has been evaluated in terms of the rate of compression it provides and its time efficiency.

Results are shown along with real texts to illustrate the performance of the new approach.

The novelties of the compression technique presented in this article are that (1) the morphological structure of words may be used to support better compression and to improve the performances of traditional compression techniques; (2) search for words can be done on the compressed text directly through the appropriate one of its layers; and (3) applications such as text mining and document classification can be performed directly on the compressed texts.

American Psychological Association (APA)

Ujan, Arafat. 2011. Multilayer model for Arabic text compression. The International Arab Journal of Information Technology،Vol. 8, no. 2, pp.188-196.
https://search.emarefa.net/detail/BIM-249570

Modern Language Association (MLA)

Ujan, Arafat. Multilayer model for Arabic text compression. The International Arab Journal of Information Technology Vol. 8, no. 2 (Apr. 2011), pp.188-196.
https://search.emarefa.net/detail/BIM-249570

American Medical Association (AMA)

Ujan, Arafat. Multilayer model for Arabic text compression. The International Arab Journal of Information Technology. 2011. Vol. 8, no. 2, pp.188-196.
https://search.emarefa.net/detail/BIM-249570

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 195-196

Record ID

BIM-249570