Improved streaming quotient filter : a duplicate detection approach for data streams

المؤلفون المشاركون

Wang, Wei
Yang, Wu
Che, Shiwei

المصدر

The International Arab Journal of Information Technology

العدد

المجلد 17، العدد 5 (30 سبتمبر/أيلول 2020)، ص ص. 769-777، 9ص.

الناشر

جامعة الزرقاء عمادة البحث العلمي

تاريخ النشر

2020-09-30

دولة النشر

الأردن

عدد الصفحات

9

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الملخص EN

The unprecedented development and popularization of the Internet, combined with the emergence of a variety of modern applications, such as search engines, online transactions, climate warning systems and so on, enables the worldwide storage of data to grow unprecedented.

Efficient storage, management and processing of such huge amounts of data has become an important academic research topic.

The detection and removal of duplicate and redundant data from such multi trillion data, while ensuring resource and computational efficiency, has constituted a challenging area of research.

Because of the fact that all the data of potentially unbounded data streams cannot be stored, and the need to delete duplicated data as accurately as possible, intelligent approximate duplicate data detection algorithms are urgently required.

Many well-known methods based on the bitmap structure, Bloom Filter and its variants are listed in the literature.

In this paper, we propose a new data structure, Improved Streaming Quotient Filter (ISQF), to efficiently detect and remove duplicate data in a data stream.

ISQF intelligently stores the signatures of elements in a data stream, while using an eviction strategy to provide near zero error rates.

We show that ISQF achieves near optimal performance with fairly low memory requirements, making it an ideal and efficient method for repeated data detection.

It has a very low error rate.

Empirically, we compared ISQF with some existing methods (especially Steaming Quotient Filter (SQF)).

The results show that our proposed method outperforms the existing methods in terms of memory usage and accuracy.

We also discuss the parallel implementation of ISQF.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Che, Shiwei& Yang, Wu& Wang, Wei. 2020. Improved streaming quotient filter : a duplicate detection approach for data streams. The International Arab Journal of Information Technology،Vol. 17, no. 5, pp.769-777.
https://search.emarefa.net/detail/BIM-1439766

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Che, Shiwei…[et al.]. Improved streaming quotient filter : a duplicate detection approach for data streams. The International Arab Journal of Information Technology Vol. 17, no. 5 (Sep. 2020), pp.769-777.
https://search.emarefa.net/detail/BIM-1439766

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Che, Shiwei& Yang, Wu& Wang, Wei. Improved streaming quotient filter : a duplicate detection approach for data streams. The International Arab Journal of Information Technology. 2020. Vol. 17, no. 5, pp.769-777.
https://search.emarefa.net/detail/BIM-1439766

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references : p. 775-777

رقم السجل

BIM-1439766