Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15250

Setting `dfs.client.use.datanode.hostname` to true can crash the system because of unhandled UnresolvedAddressException

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.2, 3.3.1, 3.4.0
    • 3.2.2, 3.3.1, 3.4.0
    • hdfs-client
    • None

    Description

      Problem:

      `dfs.client.use.datanode.hostname` by default is set to false, which means the client will use the IP address of the datanode to connect to the datanode, rather than the hostname of the datanode.

      In `org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer`:

       

       try {
         Peer peer = remotePeerFactory.newConnectedPeer(inetSocketAddress, token,
             datanode);
         LOG.trace("nextTcpPeer: created newConnectedPeer {}", peer);
         return new BlockReaderPeer(peer, false);
       } catch (IOException e) {
         LOG.trace("nextTcpPeer: failed to create newConnectedPeer connected to"
             + "{}", datanode);
         throw e;
       }
      

       

      If `dfs.client.use.datanode.hostname` is false, then it will try to connect via IP address. If the IP address is illegal and the connection fails, IOException will be thrown from `newConnectedPeer` and be handled.

      If `dfs.client.use.datanode.hostname` is true, then it will try to connect via hostname. If the hostname cannot be resolved, UnresolvedAddressException will be thrown from `newConnectedPeer`. However, UnresolvedAddressException is not a subclass of IOException so `nextTcpPeer` doesn’t handle this exception at all. This unhandled exception could crash the system.

       

      Solution:

      Since the method is handling the illegal IP address, then the illegal hostname should be also handled as well. One solution is to add the handling logic in `nextTcpPeer`:

       } catch (IOException e) {
         LOG.trace("nextTcpPeer: failed to create newConnectedPeer connected to"
             + "{}", datanode);
         throw e;
       } catch (UnresolvedAddressException e) {
         ... // handling logic 
       }

      I am very happy to provide a patch to do this.

      Attachments

        1. HDFS-15250-001.patch
          1 kB
          Ctest
        2. HDFS-15250-002.patch
          1 kB
          Ctest

        Activity

          ayushtkn Ayush Saxena added a comment -

          I am very happy to provide a patch to do this.

          Go ahead!!!

          ayushtkn Ayush Saxena added a comment - I am very happy to provide a patch to do this. Go ahead!!!
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 1m 50s Docker mode activated.
                Prechecks
          +1 dupname 0m 0s No case conflicting files found.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
                trunk Compile Tests
          +1 mvninstall 24m 14s trunk passed
          +1 compile 0m 55s trunk passed
          +1 checkstyle 0m 30s trunk passed
          +1 mvnsite 1m 9s trunk passed
          +1 shadedclient 19m 7s branch has no errors when building and testing our client artifacts.
          +1 javadoc 0m 36s trunk passed
          0 spotbugs 3m 21s Used deprecated FindBugs config; considering switching to SpotBugs.
          +1 findbugs 3m 17s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 59s the patch passed
          +1 compile 0m 51s the patch passed
          +1 javac 0m 52s the patch passed
          +1 checkstyle 0m 19s the patch passed
          +1 mvnsite 0m 51s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 shadedclient 16m 37s patch has no errors when building and testing our client artifacts.
          +1 javadoc 0m 31s the patch passed
          +1 findbugs 2m 56s the patch passed
                Other Tests
          -1 unit 2m 2s hadoop-hdfs-client in the patch passed.
          +1 asflicense 0m 29s The patch does not generate ASF License warnings.
          77m 49s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider



          Subsystem Report/Notes
          Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29243/artifact/out/Dockerfile
          JIRA Issue HDFS-15250
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/13002241/HDFS-15250-001.patch
          Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
          uname Linux dd1af6bbb7e2 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality personality/hadoop.sh
          git revision trunk / 35010120fbb
          Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/29243/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-client.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/29243/testReport/
          Max. process+thread count 308 (vs. ulimit of 5500)
          modules C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/29243/console
          versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
          Powered by Apache Yetus 0.12.0 https://yetus.apache.org

          This message was automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 1m 50s Docker mode activated.       Prechecks +1 dupname 0m 0s No case conflicting files found. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.       trunk Compile Tests +1 mvninstall 24m 14s trunk passed +1 compile 0m 55s trunk passed +1 checkstyle 0m 30s trunk passed +1 mvnsite 1m 9s trunk passed +1 shadedclient 19m 7s branch has no errors when building and testing our client artifacts. +1 javadoc 0m 36s trunk passed 0 spotbugs 3m 21s Used deprecated FindBugs config; considering switching to SpotBugs. +1 findbugs 3m 17s trunk passed       Patch Compile Tests +1 mvninstall 0m 59s the patch passed +1 compile 0m 51s the patch passed +1 javac 0m 52s the patch passed +1 checkstyle 0m 19s the patch passed +1 mvnsite 0m 51s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 shadedclient 16m 37s patch has no errors when building and testing our client artifacts. +1 javadoc 0m 31s the patch passed +1 findbugs 2m 56s the patch passed       Other Tests -1 unit 2m 2s hadoop-hdfs-client in the patch passed. +1 asflicense 0m 29s The patch does not generate ASF License warnings. 77m 49s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider Subsystem Report/Notes Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29243/artifact/out/Dockerfile JIRA Issue HDFS-15250 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/13002241/HDFS-15250-001.patch Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle uname Linux dd1af6bbb7e2 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality personality/hadoop.sh git revision trunk / 35010120fbb Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09 unit https://builds.apache.org/job/PreCommit-HDFS-Build/29243/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-client.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/29243/testReport/ Max. process+thread count 308 (vs. ulimit of 5500) modules C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client Console output https://builds.apache.org/job/PreCommit-HDFS-Build/29243/console versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1 Powered by Apache Yetus 0.12.0 https://yetus.apache.org This message was automatically generated.
          ayushtkn Ayush Saxena added a comment -

          Thanx ctest.team for the patch.
          We can use the same catch block, won't make much difference having a new block, the trace message is enough to convey that connection failed that should be enough.
          You can change like this :

              } catch (IOException | UnresolvedAddressException e) {
          
          ayushtkn Ayush Saxena added a comment - Thanx ctest.team for the patch. We can use the same catch block, won't make much difference having a new block, the trace message is enough to convey that connection failed that should be enough. You can change like this : } catch (IOException | UnresolvedAddressException e) {
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 1m 25s Docker mode activated.
                Prechecks
          +1 dupname 0m 0s No case conflicting files found.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
                trunk Compile Tests
          +1 mvninstall 21m 7s trunk passed
          +1 compile 0m 49s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 53s trunk passed
          +1 shadedclient 16m 38s branch has no errors when building and testing our client artifacts.
          +1 javadoc 0m 31s trunk passed
          0 spotbugs 2m 22s Used deprecated FindBugs config; considering switching to SpotBugs.
          +1 findbugs 2m 20s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 48s the patch passed
          +1 compile 0m 45s the patch passed
          +1 javac 0m 45s the patch passed
          +1 checkstyle 0m 17s the patch passed
          +1 mvnsite 0m 46s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 shadedclient 15m 32s patch has no errors when building and testing our client artifacts.
          +1 javadoc 0m 28s the patch passed
          +1 findbugs 2m 27s the patch passed
                Other Tests
          +1 unit 1m 58s hadoop-hdfs-client in the patch passed.
          +1 asflicense 0m 30s The patch does not generate ASF License warnings.
          68m 29s



          Subsystem Report/Notes
          Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29259/artifact/out/Dockerfile
          JIRA Issue HDFS-15250
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/13002512/HDFS-15250-002.patch
          Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
          uname Linux a05100928763 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality personality/hadoop.sh
          git revision trunk / cb64e993c27
          Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/29259/testReport/
          Max. process+thread count 295 (vs. ulimit of 5500)
          modules C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/29259/console
          versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
          Powered by Apache Yetus 0.12.0 https://yetus.apache.org

          This message was automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 1m 25s Docker mode activated.       Prechecks +1 dupname 0m 0s No case conflicting files found. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.       trunk Compile Tests +1 mvninstall 21m 7s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 53s trunk passed +1 shadedclient 16m 38s branch has no errors when building and testing our client artifacts. +1 javadoc 0m 31s trunk passed 0 spotbugs 2m 22s Used deprecated FindBugs config; considering switching to SpotBugs. +1 findbugs 2m 20s trunk passed       Patch Compile Tests +1 mvninstall 0m 48s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvnsite 0m 46s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 shadedclient 15m 32s patch has no errors when building and testing our client artifacts. +1 javadoc 0m 28s the patch passed +1 findbugs 2m 27s the patch passed       Other Tests +1 unit 1m 58s hadoop-hdfs-client in the patch passed. +1 asflicense 0m 30s The patch does not generate ASF License warnings. 68m 29s Subsystem Report/Notes Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29259/artifact/out/Dockerfile JIRA Issue HDFS-15250 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/13002512/HDFS-15250-002.patch Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle uname Linux a05100928763 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality personality/hadoop.sh git revision trunk / cb64e993c27 Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/29259/testReport/ Max. process+thread count 295 (vs. ulimit of 5500) modules C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client Console output https://builds.apache.org/job/PreCommit-HDFS-Build/29259/console versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1 Powered by Apache Yetus 0.12.0 https://yetus.apache.org This message was automatically generated.
          ayushtkn Ayush Saxena added a comment -

          Committed to trunk, branch-3.3,3.2 and 3.1. Thanx ctest.team for the contribution.

          ayushtkn Ayush Saxena added a comment - Committed to trunk, branch-3.3,3.2 and 3.1. Thanx ctest.team for the contribution.
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18231 (See https://builds.apache.org/job/Hadoop-trunk-Commit/18231/)
          HDFS-15250. Setting `dfs.client.use.datanode.hostname` to true can crash (ayushsaxena: rev aab9e0b16ecc8fa00228c00c7ab90e55195cf5f4)

          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderFactory.java
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18231 (See https://builds.apache.org/job/Hadoop-trunk-Commit/18231/ ) HDFS-15250 . Setting `dfs.client.use.datanode.hostname` to true can crash (ayushsaxena: rev aab9e0b16ecc8fa00228c00c7ab90e55195cf5f4) (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderFactory.java

          We've run into the same issue on 3.1.3 and ended up getting UnresolvedAddressException propagated all the way to clients (readers and writers) even if only one block location was not able to be resolved. So the entire read/write fails if on datanode from pipeline causes UnresolvedAddressException.

          I see the patch doesn't actually handle this exception but just logs in TRACE and rethrows it, so would we expect to see the same problem?

          I can also try out the patch on our system as it's fairly easy to reproduce in case you think this change is enough.

          timoha Andrey Elenskiy added a comment - We've run into the same issue on 3.1.3 and ended up getting UnresolvedAddressException propagated all the way to clients (readers and writers) even if only one block location was not able to be resolved. So the entire read/write fails if on datanode from pipeline causes UnresolvedAddressException. I see the patch doesn't actually handle this exception but just logs in TRACE and rethrows it, so would we expect to see the same problem? I can also try out the patch on our system as it's fairly easy to reproduce in case you think this change is enough.
          tsuna Benoit Sigoure added a comment -

          This issue is not fixed. Merely catching and logging the exception just to re-raise it doesn't solve the problem. The exception ends up preventing HDFS reads from succeeding when one of the replicas is unavailable due to UnresolvedAddressException, even though there could be other replicas available.

          tsuna Benoit Sigoure added a comment - This issue is not fixed. Merely catching and logging the exception just to re-raise it doesn't solve the problem. The exception ends up preventing HDFS reads from succeeding when one of the replicas is unavailable due to UnresolvedAddressException, even though there could be other replicas available.

          I am reviewing some backports and came across this one. The change here does not seem to fix anything as a couple of people have stated. Has anyone got a stack trace from an occurrence of this error so we can see where it fails exactly?

          sodonnell Stephen O'Donnell added a comment - I am reviewing some backports and came across this one. The change here does not seem to fix anything as a couple of people have stated. Has anyone got a stack trace from an occurrence of this error so we can see where it fails exactly?
          ctest.team Ctest added a comment -

          Hello sodonnell

          Sorry that we didn't keep the stack trace of this issue.

          All I remembered is that we set `dfs.client.use.datanode.hostname` to true and set the hostname of the datanode wrongly which triggers the exception.

          I think the system throws the correct exception here, but probably needs to handle it better.

           

          ctest.team Ctest added a comment - Hello sodonnell Sorry that we didn't keep the stack trace of this issue. All I remembered is that we set `dfs.client.use.datanode.hostname` to true and set the hostname of the datanode wrongly which triggers the exception. I think the system throws the correct exception here, but probably needs to handle it better.  

          People

            ctest.team Ctest
            ctest.team Ctest
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: