Details
-
Improvement
-
Status: Open
-
Not a Priority
-
Resolution: Unresolved
-
None
-
None
Description
Currently, retained checkpoints are persisted with one of 3 strategies:
- CHECKPOINT_NEVER_RETAINED: Retained checkpoints are never persisted
- CHECKPOINT_RETAINED_ON_FAILURE: Latest retained checkpoint is persisted in the face of job failures
- CHECKPOINT_RETAINED_ON_CANCELLATION: Latest retained checkpoint is persisted when job is canceled externally (e.g. via the REST API)
I'm proposing a third persistence mode: CHECKPOINT_RETAINED_ALWAYS. This mode would ensure that retained checkpoints are retained on successful completion of the job, and can be resumed from later.