Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15864

Job submitter / executor fail when SBN domain name can not resolved

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 3.0.4, 3.3.0, 3.1.2
    • Component/s: None
    • Labels:
      None

      Description

      Job submit failure and Task executes failure if Standby NameNode domain name can not resolved on HDFS HA with DelegationToken feature.

      This issue is triggered when create ConfiguredFailoverProxyProvider instance which invoke HAUtil.cloneDelegationTokenForLogicalUri in HA mode with Security. Since in HDFS HA mode UGI need include separate token for each NameNode in order to dealing with Active-Standby switch, the double tokens' content is same of course.
      However when #setTokenService in HAUtil.cloneDelegationTokenForLogicalUri it checks whether the address of NameNode has been resolved or not, if Not, throw #IllegalArgumentException upon, then job submitter/ task executor fail.

      HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets resolve completely.
      Another questions many guys consider is why NameNode domain name can not resolve? I think there are many scenarios, for instance node replace when meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure should not impact Hadoop cluster stability in my opinion.

      a. code ref: org.apache.hadoop.security.SecurityUtil line373-386

        public static Text buildTokenService(InetSocketAddress addr) {
          String host = null;
          if (useIpForTokenService) {
            if (addr.isUnresolved()) { // host has no ip address
              throw new IllegalArgumentException(
                  new UnknownHostException(addr.getHostName())
              );
            }
            host = addr.getAddress().getHostAddress();
          } else {
            host = StringUtils.toLowerCase(addr.getHostName());
          }
          return new Text(host + ":" + addr.getPort());
        }
      

      b.exception log ref:

      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
      at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
      at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
      at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:761)
      at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:691)
      at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
      at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
      at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
      at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
      at org.apache.hadoop.fs.viewfs.ChRootedFileSystem.<init>(ChRootedFileSystem.java:106)
      at org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
      at org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172)
      at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303)
      at org.apache.hadoop.fs.viewfs.InodeTree.<init>(InodeTree.java:377)
      at org.apache.hadoop.fs.viewfs.ViewFileSystem$1.<init>(ViewFileSystem.java:172)
      at org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172)
      at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
      at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
      at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
      at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:665)
      ... 35 more
      Caused by: java.lang.reflect.InvocationTargetException
      at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:498)
      ... 58 more
      Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: standbynamenode
      at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:390)
      at org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:369)
      at org.apache.hadoop.hdfs.HAUtil.cloneDelegationTokenForLogicalUri(HAUtil.java:317)
      at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:132)
      at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:84)
      ... 62 more
      Caused by: java.net.UnknownHostException: standbynamenode
      ... 67 more
      

        Attachments

        1. HADOOP-15864-branch.2.7.002.patch
          3 kB
          He Xiaoqiao
        2. HADOOP-15864-branch.2.7.001.patch
          1 kB
          He Xiaoqiao
        3. HADOOP-15864.branch.2.7.004.patch
          3 kB
          He Xiaoqiao
        4. HADOOP-15864.005.patch
          6 kB
          He Xiaoqiao
        5. HADOOP-15864.004.patch
          6 kB
          He Xiaoqiao
        6. HADOOP-15864.003.patch
          3 kB
          He Xiaoqiao

          Issue Links

            Activity

              People

              • Assignee:
                hexiaoqiao He Xiaoqiao
                Reporter:
                hexiaoqiao He Xiaoqiao
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated: