Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-646

Slave recovery doesn't properly handle checkpointed queued tasks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.14.0
    • None
    • None

    Description

      If the slave dies after checkpointing a queued task but before it was launched on an executor, the slave doesn't have enough information to relaunch it (because we only checkpoint Task instead of TaskInfo).
      When the executor re-registers it should simply remove these tasks from its map.

      Alternatively, slave could checkpoint TaskInfo instead of Task. We don't do this because TaskInfo.data could be potentially huge.

      Attachments

        Activity

          People

            vinodkone Vinod Kone
            vinodkone Vinod Kone
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: