Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7884

Race condition in registering YARN service in ZooKeeper

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • yarn-native-services
    • None

    Description

      In Kerberos enabled cluster, there seems to be a race condition for registering YARN service.

      Yarn-service znode creation seems to happen after AM started and reporting back to update components information. For some reason, Yarnservice znode should have access to create the znode, but reported NoAuth.

      2018-02-02 22:53:30,442 [main] INFO  service.ServiceScheduler - Set registry user accounts: sasl:hbase
      2018-02-02 22:53:30,471 [main] INFO  zk.RegistrySecurity - Registry default system acls: 
      [1,s{'world,'anyone}
      , 31,s{'sasl,'yarn}
      , 31,s{'sasl,'jhs}
      , 31,s{'sasl,'hdfs-demo}
      , 31,s{'sasl,'rm}
      , 31,s{'sasl,'hive}
      ]
      2018-02-02 22:53:30,472 [main] INFO  zk.RegistrySecurity - Registry User ACLs 
      [31,s{'sasl,'hbase}
      , 31,s{'sasl,'hbase}
      ]
      2018-02-02 22:53:30,503 [main] INFO  event.AsyncDispatcher - Registering class org.apache.hadoop.yarn.service.component.ComponentEventType for class org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler
      2018-02-02 22:53:30,504 [main] INFO  event.AsyncDispatcher - Registering class org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType for class org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler
      2018-02-02 22:53:30,528 [main] INFO  impl.NMClientAsyncImpl - Upper bound of the thread pool size is 500
      2018-02-02 22:53:30,531 [main] INFO  service.ServiceMaster - Starting service as user hbase/eyang-5.openstacklocal@EXAMPLE.COM (auth:KERBEROS)
      2018-02-02 22:53:30,545 [main] INFO  ipc.CallQueueManager - Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
      2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO  ipc.Server - Starting Socket Reader #1 for port 56859
      2018-02-02 22:53:30,589 [main] INFO  pb.RpcServerFactoryPBImpl - Adding protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to the server
      2018-02-02 22:53:30,606 [IPC Server Responder] INFO  ipc.Server - IPC Server Responder: starting
      2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO  ipc.Server - IPC Server listener on 56859: starting
      2018-02-02 22:53:30,607 [main] INFO  service.ClientAMService - Instantiated ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859
      2018-02-02 22:53:30,609 [main] INFO  zk.CuratorService - Creating CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" 
      2018-02-02 22:53:30,615 [main] INFO  zk.RegistrySecurity - Enabling ZK sasl client: jaasClientEntry = Client, principal = hbase/eyang-5.openstacklocal@EXAMPLE.COM, keytab = /etc/security/keytabs/hbase.service.keytab
      2018-02-02 22:53:30,752 [main] INFO  client.RMProxy - Connecting to ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032
      2018-02-02 22:53:30,909 [main] INFO  service.ServiceScheduler - Registering appattempt_1517611904996_0001_000001, abc into registry
      2018-02-02 22:53:30,911 [main] INFO  service.ServiceScheduler - Received 0 containers from previous attempt.
      2018-02-02 22:53:31,072 [main] INFO  service.ServiceScheduler - Could not read component paths: `/users/hbase/services/yarn-service/abc/components': No such file or directory: KeeperErrorCode = NoNode for /registry/users/hbase/services/yarn-service/abc/components
      2018-02-02 22:53:31,074 [main] INFO  service.ServiceScheduler - Triggering initial evaluation of component sleeper
      2018-02-02 22:53:31,075 [main] INFO  component.Component - [INIT COMPONENT sleeper]: 2 instances.
      2018-02-02 22:53:31,094 [main] INFO  component.Component - [COMPONENT sleeper] Transitioned from INIT to FLEXING on FLEX event.
      2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - Failed to register app abc in registry
      org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: `/registry/users/hbase/services/yarn-service/abc': Not authorized to access path; ACLs: [
      0x01: 'world,'anyone
       0x1f: 'sasl,'yarn
       0x1f: 'sasl,'jhs
       0x1f: 'sasl,'hdfs-demo
       0x1f: 'sasl,'rm
       0x1f: 'sasl,'hive
       0x1f: 'sasl,'hbase
       0x1f: 'sasl,'hbase
       ]: KeeperErrorCode = NoAuth for /registry/users/hbase/services/yarn-service/abc
      	at org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:412)
      	at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:637)
      	at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkSet(CuratorService.java:679)
      	at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.bind(RegistryOperationsService.java:116)
      	at org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.putService(YarnRegistryViewForProviders.java:195)
      	at org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.registerSelf(YarnRegistryViewForProviders.java:210)
      	at org.apache.hadoop.yarn.service.ServiceScheduler$2.run(ServiceScheduler.java:462)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /registry/users/hbase/services/yarn-service/abc
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
      	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:740)
      	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:723)
      	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
      	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:720)
      	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:484)
      	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:474)
      	at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:260)
      	at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:214)
      	at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:635)
      	... 12 more
      2018-02-02 22:53:33,135 [AMRM Callback Handler Thread] INFO  service.ServiceScheduler - 2 containers allocated. 
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            eyang Eric Yang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: