Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.1.0
-
None
-
None
Description
While running different jobs with checkpoints enabled I noticed some issues:
1) The way we pick up latest checkpoint is not correct. Current implementation just picks whatever is returned last from FileSystem.list(), which is not necessarily the last checkpoint
2) If job restarts from checkpoint it immediately creates another checkpoint.
3) We need more flexibility in GiraphJobRetryChecker to allow restarts after multiple failures.