Improved streaming quotient filter : a duplicate detection approach for data streams

Joint Authors

Wang, Wei
Yang, Wu
Che, Shiwei

Source

The International Arab Journal of Information Technology

Issue

Vol. 17, Issue 5 (30 Sep. 2020), pp.769-777, 9 p.

Publisher

Zarqa University Deanship of Scientific Research

Publication Date

2020-09-30

Country of Publication

Jordan

No. of Pages

9

Main Subjects

Information Technology and Computer Science

Abstract EN

The unprecedented development and popularization of the Internet, combined with the emergence of a variety of modern applications, such as search engines, online transactions, climate warning systems and so on, enables the worldwide storage of data to grow unprecedented.

Efficient storage, management and processing of such huge amounts of data has become an important academic research topic.

The detection and removal of duplicate and redundant data from such multi trillion data, while ensuring resource and computational efficiency, has constituted a challenging area of research.

Because of the fact that all the data of potentially unbounded data streams cannot be stored, and the need to delete duplicated data as accurately as possible, intelligent approximate duplicate data detection algorithms are urgently required.

Many well-known methods based on the bitmap structure, Bloom Filter and its variants are listed in the literature.

In this paper, we propose a new data structure, Improved Streaming Quotient Filter (ISQF), to efficiently detect and remove duplicate data in a data stream.

ISQF intelligently stores the signatures of elements in a data stream, while using an eviction strategy to provide near zero error rates.

We show that ISQF achieves near optimal performance with fairly low memory requirements, making it an ideal and efficient method for repeated data detection.

It has a very low error rate.

Empirically, we compared ISQF with some existing methods (especially Steaming Quotient Filter (SQF)).

The results show that our proposed method outperforms the existing methods in terms of memory usage and accuracy.

We also discuss the parallel implementation of ISQF.

American Psychological Association (APA)

Che, Shiwei& Yang, Wu& Wang, Wei. 2020. Improved streaming quotient filter : a duplicate detection approach for data streams. The International Arab Journal of Information Technology،Vol. 17, no. 5, pp.769-777.
https://search.emarefa.net/detail/BIM-1439766

Modern Language Association (MLA)

Che, Shiwei…[et al.]. Improved streaming quotient filter : a duplicate detection approach for data streams. The International Arab Journal of Information Technology Vol. 17, no. 5 (Sep. 2020), pp.769-777.
https://search.emarefa.net/detail/BIM-1439766

American Medical Association (AMA)

Che, Shiwei& Yang, Wu& Wang, Wei. Improved streaming quotient filter : a duplicate detection approach for data streams. The International Arab Journal of Information Technology. 2020. Vol. 17, no. 5, pp.769-777.
https://search.emarefa.net/detail/BIM-1439766

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 775-777

Record ID

BIM-1439766