Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
3.0.0
-
None
-
None
Description
RM crashes with NPE during failover because ACL configurations were changed as a result we no longer have a rights to submit an application to a queue.
Scenario:
- Submit an application
- Change ACL configuration for a queue that accepted the application so that an owner of the application will no longer have a rights to submit this application.
- Restart RM.
As a result, we get NPE:
2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state STARTED; cause: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385
Attachments
Attachments
Issue Links
- duplicates
-
YARN-7913 Improve error handling when application recovery fails with exception
- Resolved