Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6616

bestNode shouldn't always return the first DataNode

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.6.0
    • Component/s: webhdfs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When we are doing distcp between clusters, job failed:
      014-06-30 20:56:28,430 INFO org.apache.hadoop.tools.DistCp: FAIL part-r-00101.avro : java.net.NoRouteToHostException: No route to host
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
      at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
      at java.security.AccessController.doPrivileged(Native Method)
      at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
      at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
      at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
      at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:322)
      at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
      at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:419)
      at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:547)
      at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:314)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
      at org.apache.hadoop.mapred.Child.main(Child.java:249)

      The root reason is one of the DataNode can't access from outside, but inside cluster, it's health.
      In NamenodeWebHdfsMethods.java:bestNode, it always return the first DataNode, so even after the distcp retries, it still failed.

        Attachments

        1. HDFS-6616.1.patch
          22 kB
          yunjiong zhao
        2. HDFS-6616.2.patch
          25 kB
          yunjiong zhao
        3. HDFS-6616.3.patch
          25 kB
          yunjiong zhao
        4. HDFS-6616.patch
          2 kB
          yunjiong zhao

          Issue Links

            Activity

              People

              • Assignee:
                zhaoyunjiong yunjiong zhao
                Reporter:
                zhaoyunjiong yunjiong zhao
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: