Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently the container exit status is in the form of ContainerExitStatus, where only 9 failure status codes are available. However there could be many system error codes as result of launching a container shell command, or a set of such commands. It's desirable that this system code be propagated to AM, in addition to the 9 status codes that appear to result from YARN actions.
One use case is that a necessary application resource is absent on a node to launch a container command. A "file not found" status would have provided enough info for AM to take some corrective action before a retry attempt.