Privacy-preserving for distributed data streams : towards l-diversity

المؤلفون المشاركون

Muhammad, Muna
Naji, Majdi
Ghanim, Sahar

المصدر

The International Arab Journal of Information Technology

العدد

المجلد 17، العدد 1 (31 يناير/كانون الثاني 2020)، ص ص. 52-64، 13ص.

الناشر

جامعة الزرقاء عمادة البحث العلمي

تاريخ النشر

2020-01-31

دولة النشر

الأردن

عدد الصفحات

13

التخصصات الرئيسية

العلوم الهندسية والتكنولوجية (متداخلة التخصصات)

الملخص EN

Privacy-preserving data publishing have been studied widely on static data.

However, many recent applications generate data streams that are real-time, unbounded, rapidly changing, and distributed in nature.

Recently, few work addressed k-anonymity and l-diversity for data streams.

Their model implied that if the stream is distributed, it is collected at a central site for anonymization.

In this paper, we propose a novel distributed model where distributed streams are first anonymized by distributed (collecting) sites before merging and releasing.

Our approach extends Continuously Anonymizing STreaming data via adaptive cLustEring (CASTLE), a cluster-based approach that provides both k-anonymity and l-diversity for centralized data streams.

The main idea is for each site to construct its local clustering model and exchange this local view with other sites to globally construct approximately the same clustering view.

The approach is heuristic in a sense that not every update to the local view is sent, instead triggering events are selected for exchanging cluster information.

Extensive experiments on a real data set are performed to study the introduced Information Loss (IL) on different settings.

First, the impact of the different parameters on IL are quantified.

Then k-anonymity and l-diversity are compared in terms of messaging cost and IL.

Finally, the effectiveness of the proposed distributed model is studied by comparing the introduced IL to the IL of the centralized model (as a lower bound) and to a distributed model with no communication (as an upper bound).

The experimental results show that the main contributing factor to IL is the number of attributes in the quasi-identifier (50%-75%) and the number of sites contributed about 1% and this proves the scalability of the proposed approach.

In addition, providing l-diversity is shown to introduce about 25% increase in IL when compared to k-anonymity.

Moreover, 35% reduction in IL is achieved by messaging cost (in bytes) of about 0.3% of the data set size

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Muhammad, Muna& Ghanim, Sahar& Naji, Majdi. 2020. Privacy-preserving for distributed data streams : towards l-diversity. The International Arab Journal of Information Technology،Vol. 17, no. 1, pp.52-64.
https://search.emarefa.net/detail/BIM-955408

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Muhammad, Muna…[et al.]. Privacy-preserving for distributed data streams : towards l-diversity. The International Arab Journal of Information Technology Vol. 17, no. 1 (Jan. 2020), pp.52-64.
https://search.emarefa.net/detail/BIM-955408

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Muhammad, Muna& Ghanim, Sahar& Naji, Majdi. Privacy-preserving for distributed data streams : towards l-diversity. The International Arab Journal of Information Technology. 2020. Vol. 17, no. 1, pp.52-64.
https://search.emarefa.net/detail/BIM-955408

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references : p. 62-63

رقم السجل

BIM-955408