Development and performance evaluation of a bit-level text compression scheme based on the adaptive character word length algorithm

Other Title(s)

تطوير و تقييم مبدأ لضغط الملفات النصية يعتمد على خوارزمية طول الزمن المكيف

Dissertant

al-Hayik, Wiam Yahya

Thesis advisor

al-Bahadili, Husayn

Comitee Members

al-Hamami, Ala H.
al-Bashir, Umar

University

Amman Arab University

Faculty

Collage of Computer Sciences and Informatics

Department

Department of Computer Science

University Country

Jordan

Degree

Master

Degree Date

2008

English Abstract

The adaptive character wordlength (ACW) algorithm is a bit-level, lossless, adaptive, and asymmetric text compression algorithm that is recently proposed.

In this algorithm, the binary sequence is divided into a number of blocks (B) each of n-bit length (n>8).

This gives each block a possible decimal values ranges from 0 to 2n-1.

If the number of the different decimal values (d) is equal to or less than 256 (d≤256), then the binary sequence can be compressed using n-bit character wordlength, rather than using the standard 8-bit character wordlength.

Thus, a compression ratio of approximately n/8 can be achieved.

Since the compression ratio is a function of n, this algorithm is referred to as ACW(n).

Using the ACW(n) algorithm emphasizes a number of issues that may degrade its performance, and need to be carefully thought about, such as: (i) If d>256, then the binary sequence can not be compressed using n-bit character wordlength, (ii) the probability of being able to successfully compress the binary sequence using n-bit character wordlength is inversely proportional to n, and (iii) finding the optimum value of n that provides maximum compression ratio enhancement is sometimes a time consuming process, especially for large binary sequences.

In this thesis, in order to overcome all the issues that may degrade the performance of the ACW(n) algorithm that are mentioned above, we develop an efficient implementation scheme.

In this new scheme, the binary sequence is subdivided into a number of subsequences (s), each of them satisfies the condition that d≤256.

Therefore, we refer to this new scheme as ACW(n,s) scheme.

In order not to get the codes mixed up during the decompressing process, some information needs to be stored in the compressed file header.

The size of the header is directly proportional to n and s, therefore an optimization mechanism is required to find the values of n and s that yield a maximum compression ratio.

To enhance the performance of the ACW(n,s) scheme, a new adaptive text-to-binary coding format is developed.

In this coding format, an uncompressed character is coded to binary according to its probability of occurrence, rather than its equivalent ASCII code.

This coding format reduces the entropy of the binary sequence so it grants a higher compression ratio.

In this thesis, in order to evaluate the performance of the ACW(n,s) algorithm, it is implemented using C++ programming language; and it is used to compress a number of text files from standard corpora (e.g., Calgary corpus, Canterbury corpus, Artificial corpus, Large corpus, and Miscellaneous corpus).

The results obtained are presented in tables and graphs.

They are also discussed and compared with many widely used compression algorithms, and state-of-the-art software.

The results obtained demonstrate that the ACW(n,s) scheme has a higher compression ratio than many widely used compression algorithms; and it has a competitive performance with respect to state-of-the-art software.

Finally, conclusions are drawn, and recommendations for future works are pointed-out

Main Subjects

Telecommunications Engineering

Topics

No. of Pages

86

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Literatures review.

Chapter Three : The adaptive character wordlength algorithm ACW (n).

Chapter Four : The adaptive character wordlength (ACW(n,s)) algorithm.

Chapter Five : Experimental result and discussion.

Chapter Six : Conclusions and recommendations for future work.

References.

American Psychological Association (APA)

al-Hayik, Wiam Yahya. (2008). Development and performance evaluation of a bit-level text compression scheme based on the adaptive character word length algorithm. (Master's theses Theses and Dissertations Master). Amman Arab University, Jordan
https://search.emarefa.net/detail/BIM-529360

Modern Language Association (MLA)

al-Hayik, Wiam Yahya. Development and performance evaluation of a bit-level text compression scheme based on the adaptive character word length algorithm. (Master's theses Theses and Dissertations Master). Amman Arab University. (2008).
https://search.emarefa.net/detail/BIM-529360

American Medical Association (AMA)

al-Hayik, Wiam Yahya. (2008). Development and performance evaluation of a bit-level text compression scheme based on the adaptive character word length algorithm. (Master's theses Theses and Dissertations Master). Amman Arab University, Jordan
https://search.emarefa.net/detail/BIM-529360

Language

English

Data Type

Arab Theses

Record ID

BIM-529360