Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1407

Provide state reconciliation for frameworks.

    XMLWordPrintableJSON

    Details

    • Type: Epic
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Epic Name:
      Reconciliation

      Description

      State inconsistencies can arise between the framework scheduler's view of tasks and the view of tasks within Mesos.

      Frameworks, like Aurora, have had to compensate for these inconsistencies by running a specialized executor on the slave that reconciles what happened on the slave against what the scheduler thinks is the current state of tasks.

      This ticket is to track ways to allow frameworks to detect state inconsistencies both when:

      (1) There are tasks known to the framework, but unknown to Mesos. This can arise when the framework's intent was not carried out, or when a terminal event is not delivered to the framework.

      (2) There are tasks known to Mesos but unknown to the framework. This can arise when the framework suffered information loss, assuming the framework always persists its intent prior to taking an action.

      We have recently added a reconciliation message that allows frameworks to deal with (1), but nothing for (2) just yet. This could be accomplished using an "implicit" form of the same reconciliation message, or we could consider providing a way for frameworks to receive a full list of the tasks, which allows them to reconcile both (1) and (2).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bmahler Benjamin Mahler
                Reporter:
                bmahler Benjamin Mahler
              • Votes:
                1 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: