Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
2.7.0
-
None
-
Suse 11 Sp3, 2 RM, Secure
Description
Steps to reproduce
================
1. Configure cluster in secure mode
2. On RM Configure yarn.admin.acl=dsperf
3. Configure in arn.resourcemanager.principal=yarn
4. Start Both RM
Both RM will be in Standby forever
2015-06-15 12:20:21,556 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=refreshAdminAcls TARGET=AdminService RESULT=FAILURE DESCRIPTION=Unauthorized userPERMISSIONS= 2015-06-15 12:20:21,556 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:645) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:518) Caused by: org.apache.hadoop.ha.ServiceFailedException: Can not execute refreshAdminAcls at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) ... 4 more Caused by: org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User yarn doesn't have permission to call 'refreshAdminAcls' at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAcls(AdminService.java:230) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAdminAcls(AdminService.java:465) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:295) ... 5 more Caused by: org.apache.hadoop.security.AccessControlException: User yarn doesn't have permission to call 'refreshAdminAcls' at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:182) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:148) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAccess(AdminService.java:223) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAcls(AdminService.java:228) ... 7 more
Analysis
On each RM attempt to switch to Active refreshACl is called and acl permission not available for the user
Infinite retry for the same switch to Active and always false returned from
ActiveStandbyElector#becomeActive()
Expected
RM should get shutdown event after few retry or even at first attempt
Since at runtime user from which it retries for refreshacl can never be updated.
States from commands
./yarn rmadmin -getServiceState rm2
standby
./yarn rmadmin -getServiceState rm1
standby
./yarn rmadmin -checkHealth rm1
echo $? = 0
./yarn rmadmin -checkHealth rm2
echo $? = 0