Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-3544

Support task and/or executor restart on failure.

    XMLWordPrintableJSON

Details

    • Epic
    • Status: Accepted
    • Major
    • Resolution: Unresolved
    • None
    • None
    • agent, HTTP API, master
    • Support task and/or executor restart on failure.

    Description

      In certain instances it might be preferable to restart a task/executor after it fails (i.e., non-zero exit code) rather than going through an entire status update -> offer -> accept (launch) cycle to restart the task/executor on the same machine. This is especially true if the resources are reserved (dynamically or statically).

      Of course, we still want to highlight the restart to the framework, so introducing something like TASK_RESTARTED might be necessary (not sure what the analog would be for executors).

      Finally, if the task/executor has a bug we don't want to sit in an infinite loop, so we'll likely want to introduce this functionality in such a way as to limit the total restart attempts (or force a framework to have the proper authority to restart forever).

      Attachments

        Issue Links

          Activity

            People

              anandmazumdar Anand Mazumdar
              benjaminhindman Benjamin Hindman
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: