Optimizing Checkpoint Restart with Data Deduplication

Joint Authors

Chen, Zhengyu
Sun, Jianhua
Chen, Hao

Source

Scientific Programming

Issue

Vol. 2016, Issue 2016 (31 Dec. 2016), pp.1-11, 11 p.

Publisher

Hindawi Publishing Corporation

Publication Date

2016-06-08

Country of Publication

Egypt

No. of Pages

11

Main Subjects

Mathematics

Abstract EN

The increasing scale, such as the size and complexity, of computer systems brings more frequent occurrences of hardware or software faults; thus fault-tolerant techniques become an essential component in high-performance computing systems.

In order to achieve the goal of tolerating runtime faults, checkpoint restart is a typical and widely used method.

However, the exploding sizes of checkpoint files that need to be saved to external storage pose a major scalability challenge, necessitating the design of efficient approaches to reducing the amount of checkpointing data.

In this paper, we first motivate the need of redundancy elimination with a detailed analysis of checkpoint data from real scenarios.

Based on the analysis, we apply inline data deduplication to achieve the objective of reducing checkpoint size.

We use DMTCP, an open-source checkpoint restart package, to validate our method.

Our experiment shows that, by using our method, single-computer programs can reduce the size of checkpoint file by 20% and distributed programs can reduce the size of checkpoint file by 47%.

American Psychological Association (APA)

Chen, Zhengyu& Sun, Jianhua& Chen, Hao. 2016. Optimizing Checkpoint Restart with Data Deduplication. Scientific Programming،Vol. 2016, no. 2016, pp.1-11.
https://search.emarefa.net/detail/BIM-1118404

Modern Language Association (MLA)

Chen, Zhengyu…[et al.]. Optimizing Checkpoint Restart with Data Deduplication. Scientific Programming No. 2016 (2016), pp.1-11.
https://search.emarefa.net/detail/BIM-1118404

American Medical Association (AMA)

Chen, Zhengyu& Sun, Jianhua& Chen, Hao. Optimizing Checkpoint Restart with Data Deduplication. Scientific Programming. 2016. Vol. 2016, no. 2016, pp.1-11.
https://search.emarefa.net/detail/BIM-1118404

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-1118404