Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-3545

Investigate restoring tasks/executors after machine reboot.

    XMLWordPrintableJSON

    Details

    • Type: Epic
    • Status: Accepted
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: agent
    • Labels:
      None
    • Epic Name:
      Restartable Tasks

      Description

      If a task/executor is restartable (see MESOS-3544) it might make sense to force an agent to restart these tasks/executors before after a machine reboot in the event that the machine is network partitioned away from the master (or the master has failed) but we'd like to get these services running again. Assuming the agent(s) running on the machine has not been disconnected from the master for longer than the master's agent re-registration timeout the agent should be able to re-register (i.e., after a network partition is resolved) without a problem. However, in the same way that a framework would be interested in knowing that it's tasks/executors were restarted we'd want to send something like a TASK_RESTARTED status update.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                megha.sharma Megha Sharma
                Reporter:
                benjaminhindman Benjamin Hindman
                Shepherd:
                Yan Xu
              • Votes:
                1 Vote for this issue
                Watchers:
                18 Start watching this issue

                Dates

                • Created:
                  Updated: