Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.1.1
-
None
Description
We recently ran into SPARK-18016 which has been fixed in v2.3.0. This JIRA is not about the issue in SPARK-18016 but the side-effect which it brings. When SPARK-18016 occurs, ApplicationMaster fails unregistering itself because the exception contains extreme large error information.
ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: Error while decoding: java.util.concurrent.ExecutionException: java.lang.Exception: failed to compile: org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM limit of 0xFFFF /* 001 */ public java.lang.Object generate(Object[] references) { .... /* 395656 */ mutableRow.update(0, value); /* 395657 */ } /* 395658 */ /* 395659 */ return mutableRow; /* 395660 */ } /* 395661 */ }
The above codegen text is included in the final message for AM to wave goodbye to RM, while it ends up crashing the rm'sĀ ZKRMStateStore forĀ YARN-6125 not covering the unregisterApplicationMaster's message truncation. We also create an Jira on YARN Side https://issues.apache.org/jira/browse/YARN-8691
Although SPARK-18016 fixed already, there are maybe other uncaught exceptions will cause this problem. I guess that we should limit the error message's size sent to RM while unregistering AM .
Attachments
Issue Links
- relates to
-
YARN-8691 AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum size
- Resolved
- links to