Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Not A Bug
-
None
-
None
-
None
-
None
Description
See my comment here on YARN-3998. Need to take care of two things:
- The relaunch feature needs to work across NM restarts, so we should save the retry-context and policy per container into the state-store and reload it for continue relaunching after NM restart.
- We should also handle restarting of any containers that may have crashed during the NM reboot.