Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6376

when deploy flink cluster on the yarn, it is lack of hdfs delegation token.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0, 1.3.2
    • Component/s: Security, YARN
    • Labels:
      None

      Description

      1、I use the flink of version 1.2.0. And I deploy the flink cluster on the yarn. The hadoop version is 2.7.2.
      2、I use flink in security model with the keytab and principal. And the key configuration is :security.kerberos.login.keytab: /home/ketab/test.keytab 、security.kerberos.login.principal: test.
      3、The yarn configuration is default and enable the yarn log aggregation configuration" yarn.log-aggregation-enable : true";
      4、 Deploying the flink cluster on the yarn, the yarn Node manager occur the following failure when aggregation the log in HDFS. The basic reason is lack of HDFS delegation token.
      java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "SZV1000258954/10.162.181.24"; destination host is: "SZV1000258954":25000;
      at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:796)
      at org.apache.hadoop.ipc.Client.call(Client.java:1515)
      at org.apache.hadoop.ipc.Client.call(Client.java:1447)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
      at com.sun.proxy.$Proxy26.getFileInfo(Unknown Source)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:802)
      at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:201)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
      at com.sun.proxy.$Proxy27.getFileInfo(Unknown Source)
      at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1919)
      at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1500)
      at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1496)
      at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1496)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:271)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:68)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:299)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1769)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:284)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:390)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:342)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:470)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68)
      at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:194)
      at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
      at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:722)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1769)
      at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:685)
      at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:772)
      at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:394)
      at org.apache.hadoop.ipc.Client.getConnection(Client.java:1564)
      at org.apache.hadoop.ipc.Client.call(Client.java:1486)
      ... 29 more
      Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
      at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:177)
      at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:404)
      at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:581)
      at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:394)
      at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:764)
      at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:760)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1769)
      at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:759)
      ... 32 more
      5、the hadoop fix the hadoop issue 14116(https://issues.apache.org/jira/browse/HADOOP-14116), if there is no HDFS delegation token, it will try 20 times after sleeping 1 second. So it will cause the flink cluster deploy on yarn is very slowly, it will spent about 5 minutes to deploy the cluster with 2 taskmanagers.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                cong zhangrucong1982
                Reporter:
                cong zhangrucong1982
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: