Description
When an job runs with a split metainfo file that's larger than it has been configured to handle then it just crashes. This leaves the user with a less-than-ideal debug session since there are no useful diagnostic messages sent to the client for this failure. In addition it crashes before registering/unregistering with the RM and crashes without generating history, so the proxy URL is not very useful and there's no archived configuration to check to see what setting the AM was using when it encountered the error.
The AM should handle this error case more gracefully and treat the failure as it does any other failed job, with a proper unregistration from the RM and with history.
Attachments
Attachments
Issue Links
- is related to
-
YARN-522 [Umbrella] Better reporting for crashed/Killed AMs and Containers
- Open