Handling Data Skew in MapReduce Cluster by Using Partition Tuning

Joint Authors

Zhang, Jiacai
Zhou, Bing
Gao, Yufei
Zhou, Yanjie
Shi, Lei

Source

Journal of Healthcare Engineering

Issue

Vol. 2017, Issue 2017 (31 Dec. 2017), pp.1-12, 12 p.

Publisher

Hindawi Publishing Corporation

Publication Date

2017-03-29

Country of Publication

Egypt

No. of Pages

12

Main Subjects

Public Health
Medicine

Abstract EN

The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years.

The MapReduce programming model has been successfully used for big data analytics.

However, data skew invariably occurs in big data analytics and seriously affects efficiency.

To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH).

In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew.

The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets.

The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN).

We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data.

American Psychological Association (APA)

Gao, Yufei& Zhou, Yanjie& Zhou, Bing& Shi, Lei& Zhang, Jiacai. 2017. Handling Data Skew in MapReduce Cluster by Using Partition Tuning. Journal of Healthcare Engineering،Vol. 2017, no. 2017, pp.1-12.
https://search.emarefa.net/detail/BIM-1180793

Modern Language Association (MLA)

Gao, Yufei…[et al.]. Handling Data Skew in MapReduce Cluster by Using Partition Tuning. Journal of Healthcare Engineering No. 2017 (2017), pp.1-12.
https://search.emarefa.net/detail/BIM-1180793

American Medical Association (AMA)

Gao, Yufei& Zhou, Yanjie& Zhou, Bing& Shi, Lei& Zhang, Jiacai. Handling Data Skew in MapReduce Cluster by Using Partition Tuning. Journal of Healthcare Engineering. 2017. Vol. 2017, no. 2017, pp.1-12.
https://search.emarefa.net/detail/BIM-1180793

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-1180793