Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9634

webhdfs client side exceptions don't provide enough details

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.8.0, 2.7.1, 3.0.0-alpha1
    • Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
    • Component/s: webhdfs
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When a WebHDFS client side exception (for example, read timeout) occurs there are no details beyond the fact that a timeout occurred. Ideally it should say which node is responsible for the timeout, but failing that it should at least say which node we're talking to so we can examine that node's logs to further investigate.

      java.net.SocketTimeoutException: Read timed out
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(SocketInputStream.java:150)
          at java.net.SocketInputStream.read(SocketInputStream.java:121)
          at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
          at sun.net.www.MeteredStream.read(MeteredStream.java:134)
          at java.io.FilterInputStream.read(FilterInputStream.java:133)
          at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3035)
          at org.apache.commons.io.input.BoundedInputStream.read(BoundedInputStream.java:121)
          at org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:188)
          at java.io.DataInputStream.read(DataInputStream.java:149)
          at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
          at com.yahoo.grid.tools.util.io.ThrottledBufferedInputStream.read(ThrottledBufferedInputStream.java:58)
          at java.io.FilterInputStream.read(FilterInputStream.java:107)
          at com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.copyBytes(HFTPDistributedCopy.java:495)
          at com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.doCopy(HFTPDistributedCopy.java:440)
          at com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.access$200(HFTPDistributedCopy.java:57)
          at com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy$1.doExecute(HFTPDistributedCopy.java:387)
      ... 12 more
      

      There are no clues as to which datanode we're talking to nor which datanode was responsible for the timeout.

      1. HDFS-9634.001.patch
        5 kB
        Eric Payne
      2. HDFS-9634.002.patch
        5 kB
        Eric Payne

        Activity

        Hide
        eepayne Eric Payne added a comment -

        Daryn Sharp, Kihwal Lee, and Jason Lowe:
        Attached HDFS-9634.001.patch

        Show
        eepayne Eric Payne added a comment - Daryn Sharp , Kihwal Lee , and Jason Lowe : Attached HDFS-9634 .001.patch
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        -1 patch 0m 4s HDFS-9634 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



        Subsystem Report/Notes
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12781698/HDFS-9634.001.patch
        JIRA Issue HDFS-9634
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14096/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 4s HDFS-9634 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12781698/HDFS-9634.001.patch JIRA Issue HDFS-9634 Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14096/console This message was automatically generated.
        Hide
        eepayne Eric Payne added a comment -

        Attaching HDFS-9634-002.patch. Sorry about the previous bad patch.

        Show
        eepayne Eric Payne added a comment - Attaching HDFS-9634 -002.patch. Sorry about the previous bad patch.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 7m 49s trunk passed
        +1 compile 1m 48s trunk passed with JDK v1.8.0_66
        +1 compile 1m 34s trunk passed with JDK v1.7.0_91
        +1 checkstyle 0m 24s trunk passed
        +1 mvnsite 1m 28s trunk passed
        +1 mvneclipse 0m 25s trunk passed
        +1 findbugs 3m 41s trunk passed
        +1 javadoc 1m 33s trunk passed with JDK v1.8.0_66
        +1 javadoc 2m 14s trunk passed with JDK v1.7.0_91
        +1 mvninstall 1m 16s the patch passed
        +1 compile 1m 41s the patch passed with JDK v1.8.0_66
        +1 javac 1m 41s the patch passed
        +1 compile 1m 34s the patch passed with JDK v1.7.0_91
        +1 javac 1m 34s the patch passed
        -1 checkstyle 0m 24s Patch generated 1 new checkstyle issues in hadoop-hdfs-project (total was 61, now 62).
        +1 mvnsite 1m 32s the patch passed
        +1 mvneclipse 0m 22s the patch passed
        -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix.
        +1 findbugs 5m 8s the patch passed
        +1 javadoc 2m 52s the patch passed with JDK v1.8.0_66
        +1 javadoc 2m 55s the patch passed with JDK v1.7.0_91
        +1 unit 2m 2s hadoop-hdfs-client in the patch passed with JDK v1.8.0_66.
        -1 unit 72m 16s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
        +1 unit 1m 10s hadoop-hdfs-client in the patch passed with JDK v1.7.0_91.
        -1 unit 73m 14s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
        +1 asflicense 0m 29s Patch does not generate ASF License warnings.
        191m 55s



        Reason Tests
        JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
          hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
          hadoop.hdfs.server.namenode.TestNNThroughputBenchmark
          hadoop.hdfs.TestDFSUpgradeFromImage
          hadoop.hdfs.server.namenode.ha.TestHAMetrics
          hadoop.hdfs.server.datanode.TestBlockScanner
        JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.blockmanagement.TestBlockManager
          hadoop.hdfs.server.namenode.ha.TestHAAppend
          hadoop.hdfs.server.namenode.TestNNThroughputBenchmark
          hadoop.hdfs.TestErasureCodeBenchmarkThroughput
          hadoop.hdfs.server.namenode.TestRecoverStripedBlocks
          hadoop.hdfs.server.datanode.TestBlockScanner



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12781843/HDFS-9634.002.patch
        JIRA Issue HDFS-9634
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux baa12ec334cd 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 13de835
        Default Java 1.7.0_91
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14100/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
        whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/14100/artifact/patchprocess/whitespace-eol.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/14100/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/14100/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14100/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14100/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
        JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14100/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project
        Max memory used 76MB
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14100/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 49s trunk passed +1 compile 1m 48s trunk passed with JDK v1.8.0_66 +1 compile 1m 34s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 24s trunk passed +1 mvnsite 1m 28s trunk passed +1 mvneclipse 0m 25s trunk passed +1 findbugs 3m 41s trunk passed +1 javadoc 1m 33s trunk passed with JDK v1.8.0_66 +1 javadoc 2m 14s trunk passed with JDK v1.7.0_91 +1 mvninstall 1m 16s the patch passed +1 compile 1m 41s the patch passed with JDK v1.8.0_66 +1 javac 1m 41s the patch passed +1 compile 1m 34s the patch passed with JDK v1.7.0_91 +1 javac 1m 34s the patch passed -1 checkstyle 0m 24s Patch generated 1 new checkstyle issues in hadoop-hdfs-project (total was 61, now 62). +1 mvnsite 1m 32s the patch passed +1 mvneclipse 0m 22s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 findbugs 5m 8s the patch passed +1 javadoc 2m 52s the patch passed with JDK v1.8.0_66 +1 javadoc 2m 55s the patch passed with JDK v1.7.0_91 +1 unit 2m 2s hadoop-hdfs-client in the patch passed with JDK v1.8.0_66. -1 unit 72m 16s hadoop-hdfs in the patch failed with JDK v1.8.0_66. +1 unit 1m 10s hadoop-hdfs-client in the patch passed with JDK v1.7.0_91. -1 unit 73m 14s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 29s Patch does not generate ASF License warnings. 191m 55s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency   hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation   hadoop.hdfs.server.namenode.TestNNThroughputBenchmark   hadoop.hdfs.TestDFSUpgradeFromImage   hadoop.hdfs.server.namenode.ha.TestHAMetrics   hadoop.hdfs.server.datanode.TestBlockScanner JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.blockmanagement.TestBlockManager   hadoop.hdfs.server.namenode.ha.TestHAAppend   hadoop.hdfs.server.namenode.TestNNThroughputBenchmark   hadoop.hdfs.TestErasureCodeBenchmarkThroughput   hadoop.hdfs.server.namenode.TestRecoverStripedBlocks   hadoop.hdfs.server.datanode.TestBlockScanner Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12781843/HDFS-9634.002.patch JIRA Issue HDFS-9634 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux baa12ec334cd 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 13de835 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14100/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/14100/artifact/patchprocess/whitespace-eol.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14100/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14100/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14100/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14100/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14100/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14100/console This message was automatically generated.
        Hide
        eepayne Eric Payne added a comment -

        All of the unit tests that are listed above passed when I ran them in my local buildenvironment.

        Show
        eepayne Eric Payne added a comment - All of the unit tests that are listed above passed when I ran them in my local buildenvironment.
        Hide
        kshukla Kuhu Shukla added a comment -

        +1 (non-binding)
        I had questions about why we would use getAuthority() over getHost() but it does make more sense to have the ports along with the hostname. Thanks Eric Payne for clarifying that.

        Show
        kshukla Kuhu Shukla added a comment - +1 (non-binding) I had questions about why we would use getAuthority() over getHost() but it does make more sense to have the ports along with the hostname. Thanks Eric Payne for clarifying that.
        Hide
        kihwal Kihwal Lee added a comment -

        +1 looks good to me.

        Show
        kihwal Kihwal Lee added a comment - +1 looks good to me.
        Hide
        kihwal Kihwal Lee added a comment -

        When I tried it on 2.7, the new test case fails. It is passing in trunk, of course.

        TestWebHdfsTimeouts.testReadTimeout:131 expected:<localhost:58086: [Read timed out]> but was:<localhost:58086: [null]>
        
        Show
        kihwal Kihwal Lee added a comment - When I tried it on 2.7, the new test case fails. It is passing in trunk, of course. TestWebHdfsTimeouts.testReadTimeout:131 expected:<localhost:58086: [Read timed out]> but was:<localhost:58086: [null]>
        Hide
        shahrs87 Rushabh S Shah added a comment -

        The patch looks good to me.
        Ran the test case on trunk and on branch-2.7 multiple times.
        It ran successfully everytime.
        +1 (non-binding).

        Show
        shahrs87 Rushabh S Shah added a comment - The patch looks good to me. Ran the test case on trunk and on branch-2.7 multiple times. It ran successfully everytime. +1 (non-binding).
        Hide
        kihwal Kihwal Lee added a comment -

        I used jdk8 and ran TestWebHdfsTimeouts. testReadTimeout would fail occasionally. But if I run it alone, it passes 100% of times. (mvn test -Dtest=TestWebHdfsTimeouts#testReadTimeout). So I think it is failing due to interactions with other test cases and probably only happens with jdk8.

        Eric Payne If you find the cause of the test breakage, please file a jira. I don't think that blocks this jira, but will wait for your analysis before committing.

        Show
        kihwal Kihwal Lee added a comment - I used jdk8 and ran TestWebHdfsTimeouts . testReadTimeout would fail occasionally. But if I run it alone, it passes 100% of times. ( mvn test -Dtest=TestWebHdfsTimeouts#testReadTimeout ). So I think it is failing due to interactions with other test cases and probably only happens with jdk8. Eric Payne If you find the cause of the test breakage, please file a jira. I don't think that blocks this jira, but will wait for your analysis before committing.
        Hide
        eepayne Eric Payne added a comment -

        Thanks a lot, Kihwal Lee and Rushabh S Shah. I have run TestWebHdfsTimeouts several times with both java 2.7 and 2.8, and it always succeeds for me.

        Show
        eepayne Eric Payne added a comment - Thanks a lot, Kihwal Lee and Rushabh S Shah . I have run TestWebHdfsTimeouts several times with both java 2.7 and 2.8, and it always succeeds for me.
        Hide
        eepayne Eric Payne added a comment -

        Sorry, that should have been java 1.7 and 1.8.

        Show
        eepayne Eric Payne added a comment - Sorry, that should have been java 1.7 and 1.8.
        Hide
        kihwal Kihwal Lee added a comment -

        Ok, it might be my env. I will commit it and will keep an eye on 2.7 builds.

        Show
        kihwal Kihwal Lee added a comment - Ok, it might be my env. I will commit it and will keep an eye on 2.7 builds.
        Hide
        kihwal Kihwal Lee added a comment -

        I've committed the fix to trunk, branch-2, branch-2.8 and branch-2.7.

        Show
        kihwal Kihwal Lee added a comment - I've committed the fix to trunk, branch-2, branch-2.8 and branch-2.7.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9149 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9149/)
        HDFS-9634. webhdfs client side exceptions don't provide enough details. (kihwal: rev 3616c7b855962014750a3259a64c6e2a147da884)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTimeouts.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9149 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9149/ ) HDFS-9634 . webhdfs client side exceptions don't provide enough details. (kihwal: rev 3616c7b855962014750a3259a64c6e2a147da884) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTimeouts.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        This seems to contain a bug: After the exception is reinterpreted, the original stack trace is lost, and it's impossible to tell where the exception occurred.

        Show
        jojochuang Wei-Chiu Chuang added a comment - This seems to contain a bug: After the exception is reinterpreted, the original stack trace is lost, and it's impossible to tell where the exception occurred.
        Hide
        eepayne Eric Payne added a comment -

        Wei-Chiu Chuang, Thanks for reviewing the functionality of this patch.

        Did you use the FS Shell to test this (i.e., hadoop fs -cat webhdfs://SOMEHOST/MyHome/myfile.txt)? If so, the FS Shell swallows the stack trace and just prints the message when accessing files via webhdfs. It has done this for a long time. In my test environment, I reverted this change, and FS Shell behaves the same way.

        When I write a test java program, read a file via webhdfs, and inject an error, I do get the whole stack trace, including the cause message.

        Show
        eepayne Eric Payne added a comment - Wei-Chiu Chuang , Thanks for reviewing the functionality of this patch. Did you use the FS Shell to test this (i.e., hadoop fs -cat webhdfs://SOMEHOST/MyHome/myfile.txt )? If so, the FS Shell swallows the stack trace and just prints the message when accessing files via webhdfs. It has done this for a long time. In my test environment, I reverted this change, and FS Shell behaves the same way. When I write a test java program, read a file via webhdfs, and inject an error, I do get the whole stack trace, including the cause message.
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Thanks for the explanation, Eric.

        I did made the observation in tests (HDFS-9905). If it's been the case all the time, then it's probably a better idea not to change it.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Thanks for the explanation, Eric. I did made the observation in tests ( HDFS-9905 ). If it's been the case all the time, then it's probably a better idea not to change it.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Closing the JIRA as part of 2.7.3 release.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.

          People

          • Assignee:
            eepayne Eric Payne
            Reporter:
            eepayne Eric Payne
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development