Handling Data Skew in MapReduce Cluster by Using Partition Tuning
Joint Authors
Zhang, Jiacai
Zhou, Bing
Gao, Yufei
Zhou, Yanjie
Shi, Lei
Source
Journal of Healthcare Engineering
Issue
Vol. 2017, Issue 2017 (31 Dec. 2017), pp.1-12, 12 p.
Publisher
Hindawi Publishing Corporation
Publication Date
2017-03-29
Country of Publication
Egypt
No. of Pages
12
Main Subjects
Abstract EN
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years.
The MapReduce programming model has been successfully used for big data analytics.
However, data skew invariably occurs in big data analytics and seriously affects efficiency.
To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH).
In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew.
The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets.
The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN).
We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data.
American Psychological Association (APA)
Gao, Yufei& Zhou, Yanjie& Zhou, Bing& Shi, Lei& Zhang, Jiacai. 2017. Handling Data Skew in MapReduce Cluster by Using Partition Tuning. Journal of Healthcare Engineering،Vol. 2017, no. 2017, pp.1-12.
https://search.emarefa.net/detail/BIM-1180793
Modern Language Association (MLA)
Gao, Yufei…[et al.]. Handling Data Skew in MapReduce Cluster by Using Partition Tuning. Journal of Healthcare Engineering No. 2017 (2017), pp.1-12.
https://search.emarefa.net/detail/BIM-1180793
American Medical Association (AMA)
Gao, Yufei& Zhou, Yanjie& Zhou, Bing& Shi, Lei& Zhang, Jiacai. Handling Data Skew in MapReduce Cluster by Using Partition Tuning. Journal of Healthcare Engineering. 2017. Vol. 2017, no. 2017, pp.1-12.
https://search.emarefa.net/detail/BIM-1180793
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references
Record ID
BIM-1180793