2) Kill all three from (1), in the order they were started.
3) Restart the master and agent. Do not restart the framework.
- The agent will reconnect to an orphaned task.
- The Web UI will report no memory usage
- curl localhost:5050/metrics/snapshot will say: "master/mem_used": 128,
When a framework registers with the master, it provides a failover_timeout, in case the framework disconnects. If the framework disconnects and does not reconnect within this failover_timeout, the master will kill all tasks belonging to the framework.
However, the master does not persist this failover_timeout across master failover. The master will "forget" about a framework if:
1) The master dies before failover_timeout passes.
2) The framework dies while the master is dead.
When the master comes back up, the agent will re-register. The agent will report the orphaned task(s). Because the master failed over, it does not know these tasks are orphans (i.e. it thinks the frameworks might re-register).
The master should save the FrameworkID and failover_timeout in the registry. Upon recovery, the master should resume the failover_timeout timers.