The key idea here is to provide the user a simple API to write out the "state" of its computation to a safe place, where it could be retrieved later on to restart the computation. The main use we have for this at the moment is in the context of the preemption work in
YARN-45, YARN-567, YARN-568, YARN-569, MAPREDUCE-5196, MAPREDUCE-5197, and MAPREDUCE-5189.
However checkpointing has many other uses which we can consider in the future, e.g., sub-task granularity for recovery, and many dynamic optimizations (we played around with suspending reducers voluntarily if all pending maps are struggling).
The main design point of the API is to keep the location of the checkpoint secret until the user is done writing to it, and allow to open existing checkpoint only for read. This tradeoff appending state to an existing checkpoint for simplicity in enforcing atomicity, write-once semantics. In fact, this two strong properties are almost free given the chosen API. This will benefit alternative implementations of the checkpoint API (you can envision main-memory versions of this if checkpointing is used only to "transfer" a computation instead of long-term suspension or failure-tollerance, we had a map-servlet based version as well, etc...).