Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
When a checkpointing slave is not able to recover (for whatever reason) it tries to register as a new slave. But if this registration happens before master has removed the old slave, the master simply gives the old slave id for the new slave. This means the master thinks the slave is running a bunch of tasks whereas the slave thinks it is new.
Master should remove the slave from its map (send TASK_LOST updates) when this happens and create a new slave entry.