Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-614

Master should remove checkpointing slave that gets disconnected when the new slave tries to register

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.14.0
    • None
    • None

    Description

      When a checkpointing slave is not able to recover (for whatever reason) it tries to register as a new slave. But if this registration happens before master has removed the old slave, the master simply gives the old slave id for the new slave. This means the master thinks the slave is running a bunch of tasks whereas the slave thinks it is new.

      Master should remove the slave from its map (send TASK_LOST updates) when this happens and create a new slave entry.

      Attachments

        Activity

          People

            vinodkone Vinod Kone
            vinodkone Vinod Kone
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: