Uploaded image for project: 'Hama'
  1. Hama
  2. HAMA-505 Fault Tolerant Job Processing
  3. HAMA-533

BSP Peer should have the ability to start with a non-zero superstep from a partition of checkpointed message for that task ID, attempt ID

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.5.0
    • 0.6.0
    • bsp core
    • None

    Description

      Currently BSP Peer initializes itself to start a task afresh from superstep -1. We should have a new flavor of BSP Peer that would be started by GroomServer to recover a task as directed by BSP Master. The BSP Peer should start the task knowing the superstep number to start with, the task ID, the attemptID and the partition of the checkpointed file. The input would not be split in this case. The BSP Peer should update its task status at Groomserver as RUNNING once the task is out of the recovery superstep barrier sync.

      Attachments

        Activity

          People

            Unassigned Unassigned
            surajsmenon Suraj Menon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: