Details
-
Epic
-
Status: Accepted
-
Major
-
Resolution: Unresolved
-
None
-
None
-
Support task and/or executor restart on failure.
Description
In certain instances it might be preferable to restart a task/executor after it fails (i.e., non-zero exit code) rather than going through an entire status update -> offer -> accept (launch) cycle to restart the task/executor on the same machine. This is especially true if the resources are reserved (dynamically or statically).
Of course, we still want to highlight the restart to the framework, so introducing something like TASK_RESTARTED might be necessary (not sure what the analog would be for executors).
Finally, if the task/executor has a bug we don't want to sit in an infinite loop, so we'll likely want to introduce this functionality in such a way as to limit the total restart attempts (or force a framework to have the proper authority to restart forever).
Attachments
Issue Links
- is related to
-
MESOS-6487 Add onTerminationPolicy to ExecutorInfo
- Open
-
MESOS-7068 Add OnTerminationPolicy handling to the default executor.
- Reviewable