Existing native service assumes the service is long running and never finishes. Containers will be restarted even if exit code == 0.
To support boarder use cases, we need to allow restart policy of component specified by users. Propose to have following policies:
1) Always: containers always restarted by framework regardless of container exit status. This is existing/default behavior.
2) Never: Do not restart containers in any cases after container finishes: To support job-like workload (for example Tensorflow training job). If a task exit with code == 0, we should not restart the task. This can be used by services which is not restart/recovery-able.
3) On-failure: Similar to above, only restart task with exitcode != 0.
Behaviors after component instance finalize (Succeeded or Failed when restart_policy != ALWAYS):
1) For single component, single instance: complete service.
2) For single component, multiple instance: other running instances from the same component won't be affected by the finalized component instance. Service will be terminated once all instances finalized.
3) For multiple components: Service will be terminated once all components finalized.
- relates to
YARN-8044 Determine the appropriate default ContainerRetryPolicy
YARN-8255 Allow option to disable flex for a service component
YARN-10449 Flexing doesn't consider containers which were stopped