A novel web search engine model based on index-query bit-level compression

Dissertant

Sab, Sayf Mahmud

Thesis advisor

al-Bahadili, Husayn

Comitee Members

al-Shaykh, Isam
al-Zayyat, Khalid
Hattab, Izz al-Din Shakir Hasan

University

Arab Academy for Financial and Banking Sciences

Faculty

The Faculty of Information Systems and Technology

Department

Computer information systems

University Country

Jordan

Degree

Ph.D.

Degree Date

2011

English Abstract

Web search engine is an information retrieval system designed to help finding information stored on the Web.

Standard Web search engine consists of three main Components : Web crawler, document analyzer and indexer, and search processor.

Due to the rapid growth in the size of the Web, Web search engines are facing enormous performance challenges, in terms of: storage capacity, data retrieval rate, query processing time, and communication overhead.

Large search engines, in particular, have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue.

To satisfy this heavy workload, search engines use a variety of performance optimizations including succinct data structure, compressed text indexing, query optimization, high-speed processing and communication systems, and efficient search engine architectural design.

However, it is believed that the performance of the current Web search engine models still short from meeting users and applications needs. In this work we develop a novel Web search engine model based on index-query compression, therefore, it is referred to as the compressed index-query (CIQ) model. The model incorporates two compression layers both implemented at the back-end processor (server) side, one layer resides after the indexer acting as a second compression layer to generate a double compressed index, and the second layer be located after the query parser for query compression to enable compressed index query search.

The data compression algorithm used is the novel Hamming code data compression (HCDC) algorithm. The different components of the CIQ model is implemented in a number of procedures forming what is referred to as the CIQ test tool (CIQTT), which is used as a test bench to validate the accuracy and integrity of the retrieved data, and to evaluate the performance of the CIQ model.

The results obtained demonstrate that the new CIQ model attained an excellent performance as compared to the current uncompressed model, as such: the CIQ model achieved a tremendous accuracy with 100 % agreement with the current uncompressed model. The new model demands less disk space as the HCDC algorithm achieves a compression ratio over 1.3 with compression efficiency of more than 95 %, which implies a reduction in storage requirement over 24 %.

The new CIQ model performs faster than the current model as it achieves a speed up factor over 1.3 providing a reduction in processing time of over 24 %.

Main Subjects

Information Technology and Computer Science

Topics

No. of Pages

118

Table of Contents

Table of contents.

Abstract.

Chapter one : Introduction.

Chapter two : Literature review.

Chapter three : The novel CIQ web search engine model.

Chapter four : Results and discussions.

Chapter five : Conclusions and recommendations for future work.

References.

American Psychological Association (APA)

Sab, Sayf Mahmud. (2011). A novel web search engine model based on index-query bit-level compression. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-306687

Modern Language Association (MLA)

Sab, Sayf Mahmud. A novel web search engine model based on index-query bit-level compression. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences. (2011).
https://search.emarefa.net/detail/BIM-306687

American Medical Association (AMA)

Sab, Sayf Mahmud. (2011). A novel web search engine model based on index-query bit-level compression. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-306687

Language

English

Data Type

Arab Theses

Record ID

BIM-306687