Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
2.3.0
-
None
-
Reviewed
Description
Sometimes, the RM fails to recover an application. It could be because of turning security on, token expiry, or issues connecting to HDFS etc. The causes could be classified into (1) transient, (2) specific to one application, and (3) permanent and apply to multiple (all) applications. Today, the RM fails to transition to Active and ends up in STOPPED state and can never be transitioned to Active again.
The initial stacktrace reported is at https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf
Attachments
Attachments
Issue Links
- is related to
-
YARN-2862 RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used
- Open