Hybrid technique for Arabic text compression

Other Title(s)

تقنية هجينة لضغط النصوص العربية

Dissertant

Abu Jrai, Inas Mahmud

Thesis advisor

Ujan, Arafat

Comitee Members

Kayid, Ahmad
al-Atum, Jalal

University

Middle East University

Faculty

Faculty of Information Technology

Department

Computer Science Department

University Country

Jordan

Degree

Master

Degree Date

2013

English Abstract

Compression techniques have gained great importance in the field of communications and information technology in order to reduce the growing size of data and to increase the data transmission speed between computers and over the networks.

In addition to these aims, text compression techniques aim at using the compressed text in text oriented application such as searching, summarizing, and information retrieval.

In this work, the morphological feature of Arabic language was used to build a new technique of Arabic Text Compression that satisfies two objectives: obtaining good compression rate and having compressed text structure that can be used to extract information from the compressed file instead of the original file.

In addition to reducing the text size and increasing the transmission speed, these techniques speed up the information extraction, terms of searching processes.

Different common compression techniques Lempel–Ziv-Welch (LZW) and Burrows Wheeler Transform (BWT) were tested on Arabic texts and their results were compared in term of compression ratio.

LZW was the best one for all categories of the Arabic texts, then BWT techniques.

Features of the Arabic language were studied and then exploited to improve the performance of these techniques.

The fact that Arabic letters have a single case was used to improve the performance of LZW.

Through exploitation of the unused locations of the dictionary, the results showed that the compression ratio for the proposed method was better than all the other techniques.

The morphological features of the Arabic language had been used as a pre-processing step for data compression.

As a result of this study, a new hybrid technique has been suggested with better results in term of compression rate and text researchable files.

This technique works in phases.

In the first phase, the text file is split into four different files using a Multilayer analyzer.

In the second phase, each one of these four files is compressed using BWT.

Different compression techniques were investigated and tested at the level of each one of the four files.

BWT technique was found to be suitable for all text files in terms compression ratio.

The integration of the Multilayer model with LZW to compress all the files reduced the compression time.

Main Subjects

Information Technology and Computer Science

No. of Pages

69

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Literature survey.

Chapter Three : Framework design and implementation.

Chapter Four : Experimental evaluation.

Chapter Five : Conclusion and future work.

References.

American Psychological Association (APA)

Abu Jrai, Inas Mahmud. (2013). Hybrid technique for Arabic text compression. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-699373

Modern Language Association (MLA)

Abu Jrai, Inas Mahmud. Hybrid technique for Arabic text compression. (Master's theses Theses and Dissertations Master). Middle East University. (2013).
https://search.emarefa.net/detail/BIM-699373

American Medical Association (AMA)

Abu Jrai, Inas Mahmud. (2013). Hybrid technique for Arabic text compression. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-699373

Language

English

Data Type

Arab Theses

Record ID

BIM-699373