Handling Data Skew in MapReduce Cluster by Using Partition Tuning

المؤلفون المشاركون

Zhang, Jiacai
Zhou, Bing
Gao, Yufei
Zhou, Yanjie
Shi, Lei

المصدر

Journal of Healthcare Engineering

العدد

المجلد 2017، العدد 2017 (31 ديسمبر/كانون الأول 2017)، ص ص. 1-12، 12ص.

الناشر

Hindawi Publishing Corporation

تاريخ النشر

2017-03-29

دولة النشر

مصر

عدد الصفحات

12

التخصصات الرئيسية

الصحة العامة
الطب البشري

الملخص EN

The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years.

The MapReduce programming model has been successfully used for big data analytics.

However, data skew invariably occurs in big data analytics and seriously affects efficiency.

To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH).

In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew.

The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets.

The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN).

We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Gao, Yufei& Zhou, Yanjie& Zhou, Bing& Shi, Lei& Zhang, Jiacai. 2017. Handling Data Skew in MapReduce Cluster by Using Partition Tuning. Journal of Healthcare Engineering،Vol. 2017, no. 2017, pp.1-12.
https://search.emarefa.net/detail/BIM-1180793

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Gao, Yufei…[et al.]. Handling Data Skew in MapReduce Cluster by Using Partition Tuning. Journal of Healthcare Engineering No. 2017 (2017), pp.1-12.
https://search.emarefa.net/detail/BIM-1180793

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Gao, Yufei& Zhou, Yanjie& Zhou, Bing& Shi, Lei& Zhang, Jiacai. Handling Data Skew in MapReduce Cluster by Using Partition Tuning. Journal of Healthcare Engineering. 2017. Vol. 2017, no. 2017, pp.1-12.
https://search.emarefa.net/detail/BIM-1180793

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references

رقم السجل

BIM-1180793