Uploaded image for project: 'Giraph'
  1. Giraph
  2. GIRAPH-950

Auto-restart from checkpoint doesn't pick up latest checkpoint

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.1.0
    • Component/s: None
    • Labels:
      None

      Description

      While running different jobs with checkpoints enabled I noticed some issues:
      1) The way we pick up latest checkpoint is not correct. Current implementation just picks whatever is returned last from FileSystem.list(), which is not necessarily the last checkpoint
      2) If job restarts from checkpoint it immediately creates another checkpoint.
      3) We need more flexibility in GiraphJobRetryChecker to allow restarts after multiple failures.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              edunov Sergey Edunov
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: