Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Mesos Q3 Sprint 6, Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2
-
3
Description
When a slave re-registers with the master, it currently sends the latest task state for all tasks that are not both terminal and acknowledged.
However, reconciliation assumes that we always have the latest unacknowledged state of the task represented in the master.
As a result, out-of-order updates are possible, e.g.
(1) Slave has task T in TASK_FINISHED, with unacknowledged updates: [TASK_RUNNING, TASK_FINISHED].
(2) Master fails over.
(3) New master re-registers the slave with T in TASK_FINISHED.
(4) Reconciliation request arrives, master sends TASK_FINISHED.
(5) Slave sends TASK_RUNNING to master, master sends TASK_RUNNING.
I think the fix here is to preserve the task state invariants in the master, namely, that the master has the latest unacknowledged state of the task. This means when the slave re-registers, it should instead send the latest acknowledged state of each task.
Attachments
Issue Links
- is related to
-
MESOS-1696 Improve reconciliation between master and slave.
- Resolved
-
MESOS-1817 Completed tasks remains in TASK_RUNNING when framework is disconnected
- Resolved