Improving IO Efficiency in Hadoop-Based Massive Data Analysis Programs
Joint Authors
Lee, Kyong-Ha
Kang, Woo Lam
Suh, Young-Kyoon
Source
Issue
Vol. 2018, Issue 2018 (31 Dec. 2018), pp.1-9, 9 p.
Publisher
Hindawi Publishing Corporation
Publication Date
2018-12-02
Country of Publication
Egypt
No. of Pages
9
Main Subjects
Abstract EN
Apache Hadoop has been a popular parallel processing tool in the era of big data.
While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature.
In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop.
We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals.
We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.
American Psychological Association (APA)
Lee, Kyong-Ha& Kang, Woo Lam& Suh, Young-Kyoon. 2018. Improving IO Efficiency in Hadoop-Based Massive Data Analysis Programs. Scientific Programming،Vol. 2018, no. 2018, pp.1-9.
https://search.emarefa.net/detail/BIM-1214653
Modern Language Association (MLA)
Lee, Kyong-Ha…[et al.]. Improving IO Efficiency in Hadoop-Based Massive Data Analysis Programs. Scientific Programming No. 2018 (2018), pp.1-9.
https://search.emarefa.net/detail/BIM-1214653
American Medical Association (AMA)
Lee, Kyong-Ha& Kang, Woo Lam& Suh, Young-Kyoon. Improving IO Efficiency in Hadoop-Based Massive Data Analysis Programs. Scientific Programming. 2018. Vol. 2018, no. 2018, pp.1-9.
https://search.emarefa.net/detail/BIM-1214653
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references
Record ID
BIM-1214653