Improving IO Efficiency in Hadoop-Based Massive Data Analysis Programs

Joint Authors

Lee, Kyong-Ha
Kang, Woo Lam
Suh, Young-Kyoon

Source

Scientific Programming

Issue

Vol. 2018, Issue 2018 (31 Dec. 2018), pp.1-9, 9 p.

Publisher

Hindawi Publishing Corporation

Publication Date

2018-12-02

Country of Publication

Egypt

No. of Pages

9

Main Subjects

Mathematics

Abstract EN

Apache Hadoop has been a popular parallel processing tool in the era of big data.

While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature.

In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop.

We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals.

We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.

American Psychological Association (APA)

Lee, Kyong-Ha& Kang, Woo Lam& Suh, Young-Kyoon. 2018. Improving IO Efficiency in Hadoop-Based Massive Data Analysis Programs. Scientific Programming،Vol. 2018, no. 2018, pp.1-9.
https://search.emarefa.net/detail/BIM-1214653

Modern Language Association (MLA)

Lee, Kyong-Ha…[et al.]. Improving IO Efficiency in Hadoop-Based Massive Data Analysis Programs. Scientific Programming No. 2018 (2018), pp.1-9.
https://search.emarefa.net/detail/BIM-1214653

American Medical Association (AMA)

Lee, Kyong-Ha& Kang, Woo Lam& Suh, Young-Kyoon. Improving IO Efficiency in Hadoop-Based Massive Data Analysis Programs. Scientific Programming. 2018. Vol. 2018, no. 2018, pp.1-9.
https://search.emarefa.net/detail/BIM-1214653

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-1214653