Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Agent does’t recover its state post a host reboot, it registers with the master and gets a new SlaveID. With partition awareness, the agents are now allowed to re-register after they have been marked Unreachable. The executors are anyway terminated on the agent when it reboots so there is no harm in letting the agent keep its SlaveID, re-register with the master and reconcile the lost executors. This is a pre-requisite for supporting persistent/restartable tasks in mesos (MESOS-3545).
Attachments
Issue Links
- is related to
-
MESOS-1739 Allow slave reconfiguration on restart
- Resolved
-
MESOS-5368 Consider introducing persistent agent ID
- Open
- relates to
-
MESOS-8125 Agent should properly handle recovering an executor when its pid is reused
- Resolved
- supercedes
-
MESOS-5396 After failover, master does not remove agents with same UPID.
- Accepted