Details
-
Epic
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reconciliation
Description
State inconsistencies can arise between the framework scheduler's view of tasks and the view of tasks within Mesos.
Frameworks, like Aurora, have had to compensate for these inconsistencies by running a specialized executor on the slave that reconciles what happened on the slave against what the scheduler thinks is the current state of tasks.
This ticket is to track ways to allow frameworks to detect state inconsistencies both when:
(1) There are tasks known to the framework, but unknown to Mesos. This can arise when the framework's intent was not carried out, or when a terminal event is not delivered to the framework.
(2) There are tasks known to Mesos but unknown to the framework. This can arise when the framework suffered information loss, assuming the framework always persists its intent prior to taking an action.
We have recently added a reconciliation message that allows frameworks to deal with (1), but nothing for (2) just yet. This could be accomplished using an "implicit" form of the same reconciliation message, or we could consider providing a way for frameworks to receive a full list of the tasks, which allows them to reconcile both (1) and (2).
Attachments
Issue Links
- blocks
-
AURORA-715 Retire GC Executor
- Resolved