Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9905

WebHdfsFileSystem#runWithRetry should display original stack trace on error

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.3
    • Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
    • Component/s: test
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When checking for a timeout in TestWebHdfsTimeouts, it does get SocketTimeoutException, but the message sometimes does not contain "connect timed out". Since the original exception is not logged, we do not know details.

      1. HDFS-9905.001.patch
        4 kB
        Wei-Chiu Chuang
      2. HDFS-9905.002.patch
        5 kB
        Wei-Chiu Chuang
      3. HDFS-9905-branch-2.7.002.patch
        5 kB
        Eric Payne

        Activity

        Hide
        kihwal Kihwal Lee added a comment -

        This is from a precommit for HDFS-9239.

        Running org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts
        Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.611 sec <<< FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts
        testAuthUrlReadTimeout[timeoutSource=ConnectionFactory](org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts)  Time elapsed: 0.083 sec  <<< FAILURE!
        org.junit.ComparisonFailure: expected:<localhost:37255: [Read timed out]> but was:<localhost:37255: [null]>
        	at org.junit.Assert.assertEquals(Assert.java:115)
        	at org.junit.Assert.assertEquals(Assert.java:144)
        	at org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testAuthUrlReadTimeout(TestWebHdfsTimeouts.java:195)
        

        We also saw this from our own build.

        org.junit.ComparisonFailure: expected:<localhost:38830: [connect timed out]> but was:<localhost:38830: [null]>
        	at org.junit.Assert.assertEquals(Assert.java:115)
        	at org.junit.Assert.assertEquals(Assert.java:144)
        	at org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testTwoStepWriteConnectTimeout(TestWebHdfsTimeouts.java:206)
        
        Show
        kihwal Kihwal Lee added a comment - This is from a precommit for HDFS-9239 . Running org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.611 sec <<< FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts testAuthUrlReadTimeout[timeoutSource=ConnectionFactory](org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts) Time elapsed: 0.083 sec <<< FAILURE! org.junit.ComparisonFailure: expected:<localhost:37255: [Read timed out]> but was:<localhost:37255: [null]> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testAuthUrlReadTimeout(TestWebHdfsTimeouts.java:195) We also saw this from our own build. org.junit.ComparisonFailure: expected:<localhost:38830: [connect timed out]> but was:<localhost:38830: [null]> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testTwoStepWriteConnectTimeout(TestWebHdfsTimeouts.java:206)
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Thanks Kihwal Lee for reporting this. I am assigning this to myself.
        I am also seeing test failures in TestWebHdfsTimeouts. But in our jenkins we see it repeated fail after March 2.

        Looks like a regression introduced after HDFS-9887.

        I think the Timeout parameter in the tests should be more than 60 seconds, otherwise surefire would erroneously thought tests ran out of time.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Thanks Kihwal Lee for reporting this. I am assigning this to myself. I am also seeing test failures in TestWebHdfsTimeouts. But in our jenkins we see it repeated fail after March 2. Looks like a regression introduced after HDFS-9887 . I think the Timeout parameter in the tests should be more than 60 seconds, otherwise surefire would erroneously thought tests ran out of time.
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        I will modify the test cases to use GenericTestUtils#assertExceptionContains, which prints detailed exception if the error message doesn't match.

        Show
        jojochuang Wei-Chiu Chuang added a comment - I will modify the test cases to use GenericTestUtils#assertExceptionContains , which prints detailed exception if the error message doesn't match.
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        This is the full stacktrace: unfortunately, the exception object was reinterpreted in the exception handling, so the original stack trace was lost.
        We might also want to retain the original full stack trace.

        Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.555 sec <<< FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts
        testAuthUrlReadTimeout[timeoutSource=Configuration](org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts)  Time elapsed: 0.036 sec  <<< FAILURE!
        java.lang.AssertionError: Expected to find 'localhost:38159: Read timed out' but got unexpected exception:java.net.SocketTimeoutException: localhost:38159: null
                at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source)
                at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
                at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:733)
                at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:555)
                at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:586)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
                at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:582)
                at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1456)
                at org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testAuthUrlReadTimeout(TestWebHdfsTimeouts.java:195)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:606)
                at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
                at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
                at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
                at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
                at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
        
                at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source)
                at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
                at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:733)
                at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:555)
                at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:586)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
                at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:582)
                at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1456)
                at org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testAuthUrlReadTimeout(TestWebHdfsTimeouts.java:195)
        
        Show
        jojochuang Wei-Chiu Chuang added a comment - This is the full stacktrace: unfortunately, the exception object was reinterpreted in the exception handling, so the original stack trace was lost. We might also want to retain the original full stack trace. Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.555 sec <<< FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts testAuthUrlReadTimeout[timeoutSource=Configuration](org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts) Time elapsed: 0.036 sec <<< FAILURE! java.lang.AssertionError: Expected to find 'localhost:38159: Read timed out' but got unexpected exception:java.net.SocketTimeoutException: localhost:38159: null at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:733) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:555) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:586) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:582) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1456) at org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testAuthUrlReadTimeout(TestWebHdfsTimeouts.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:733) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:555) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:586) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:582) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1456) at org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testAuthUrlReadTimeout(TestWebHdfsTimeouts.java:195)
        Hide
        kihwal Kihwal Lee added a comment -

        Looks like a regression introduced after HDFS-9887

        We were seeing this failure in the 2.7 builds, which does not have HDFS-9887.

        Show
        kihwal Kihwal Lee added a comment - Looks like a regression introduced after HDFS-9887 We were seeing this failure in the 2.7 builds, which does not have HDFS-9887 .
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Kihwal Lee: you are right. This is unrelated to HDFS-9887, even though HDFS-9887 also contains another bug unrelated to this issue.
        But thanks to your comments. I think I should rebase to 2.7 to avoid exception reinterpretation in HDFS-9634.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Kihwal Lee : you are right. This is unrelated to HDFS-9887 , even though HDFS-9887 also contains another bug unrelated to this issue. But thanks to your comments. I think I should rebase to 2.7 to avoid exception reinterpretation in HDFS-9634 .
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Use GenericTestUtils.assertExceptionContains instead of Assert.assertEquals so that if the string doesn't match, it logs the exception.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Use GenericTestUtils.assertExceptionContains instead of Assert.assertEquals so that if the string doesn't match, it logs the exception.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 11s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 6m 32s trunk passed
        +1 compile 0m 38s trunk passed with JDK v1.8.0_77
        +1 compile 0m 41s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 21s trunk passed
        +1 mvnsite 0m 50s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 1m 57s trunk passed
        +1 javadoc 1m 5s trunk passed with JDK v1.8.0_77
        +1 javadoc 1m 46s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 44s the patch passed
        +1 compile 0m 36s the patch passed with JDK v1.8.0_77
        +1 javac 0m 36s the patch passed
        +1 compile 0m 38s the patch passed with JDK v1.7.0_95
        +1 javac 0m 38s the patch passed
        +1 checkstyle 0m 18s the patch passed
        +1 mvnsite 0m 48s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 8s the patch passed
        +1 javadoc 1m 4s the patch passed with JDK v1.8.0_77
        +1 javadoc 1m 42s the patch passed with JDK v1.7.0_95
        -1 unit 57m 22s hadoop-hdfs in the patch failed with JDK v1.8.0_77.
        -1 unit 62m 51s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
        +1 asflicense 0m 26s Patch does not generate ASF License warnings.
        145m 9s



        Reason Tests
        JDK v1.8.0_77 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeMetrics
          hadoop.fs.contract.hdfs.TestHDFSContractSeek
          hadoop.hdfs.server.namenode.TestStartup
        JDK v1.7.0_95 Failed junit tests hadoop.fs.contract.hdfs.TestHDFSContractSeek
          hadoop.hdfs.server.balancer.TestBalancer
        JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:fbe3e86
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797802/HDFS-9905.001.patch
        JIRA Issue HDFS-9905
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 0174c705dd6a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 843ee8d
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/15123/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/15123/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15123/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15123/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15123/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15123/console
        Powered by Apache Yetus 0.2.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 32s trunk passed +1 compile 0m 38s trunk passed with JDK v1.8.0_77 +1 compile 0m 41s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 50s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 57s trunk passed +1 javadoc 1m 5s trunk passed with JDK v1.8.0_77 +1 javadoc 1m 46s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 44s the patch passed +1 compile 0m 36s the patch passed with JDK v1.8.0_77 +1 javac 0m 36s the patch passed +1 compile 0m 38s the patch passed with JDK v1.7.0_95 +1 javac 0m 38s the patch passed +1 checkstyle 0m 18s the patch passed +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 8s the patch passed +1 javadoc 1m 4s the patch passed with JDK v1.8.0_77 +1 javadoc 1m 42s the patch passed with JDK v1.7.0_95 -1 unit 57m 22s hadoop-hdfs in the patch failed with JDK v1.8.0_77. -1 unit 62m 51s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 26s Patch does not generate ASF License warnings. 145m 9s Reason Tests JDK v1.8.0_77 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeMetrics   hadoop.fs.contract.hdfs.TestHDFSContractSeek   hadoop.hdfs.server.namenode.TestStartup JDK v1.7.0_95 Failed junit tests hadoop.fs.contract.hdfs.TestHDFSContractSeek   hadoop.hdfs.server.balancer.TestBalancer JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2 Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797802/HDFS-9905.001.patch JIRA Issue HDFS-9905 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0174c705dd6a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 843ee8d Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/15123/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15123/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15123/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15123/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15123/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15123/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
        Hide
        iwasakims Masatake Iwasaki added a comment -

        Hmm... the stacktrace printed by GenericTestUtils.assertExceptionContains does not show the root cause because TestWebHdfsTimeouts#runWithRetry recreate SocketTimeoutException to add host address to the message.

        I got following stack by commenting out the recreating exception part of TestWebHdfsTimeouts#runWithRetry. java.net.SocksSocketImpl is possible to throw SocketTimeoutException with null message. We seem not to be able to expect that SocketTimeoutException always contains message such as "Read timed out" or "connect timed out".

        java.lang.AssertionError: Expected to find ': Read timed out' but got unexpected exception:java.net.SocketTimeoutException
        	at java.net.SocksSocketImpl.remainingMillis(SocksSocketImpl.java:111)
        	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        	at java.net.Socket.connect(Socket.java:589)
        	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
        	at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
        	at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
        	at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
        	at sun.net.www.http.HttpClient.New(HttpClient.java:308)
        	at sun.net.www.http.HttpClient.New(HttpClient.java:326)
        	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
        	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
        	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
        	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
        	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:684)
        	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:637)
        	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:709)
        	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:555)
        	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:586)
        	at java.security.AccessController.doPrivileged(Native Method)
        	at javax.security.auth.Subject.doAs(Subject.java:422)
        	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
        	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:582)
        	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1466)
        	at org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testAuthUrlReadTimeout(TestWebHdfsTimeouts.java:198)
        
        Show
        iwasakims Masatake Iwasaki added a comment - Hmm... the stacktrace printed by GenericTestUtils.assertExceptionContains does not show the root cause because TestWebHdfsTimeouts#runWithRetry recreate SocketTimeoutException to add host address to the message. I got following stack by commenting out the recreating exception part of TestWebHdfsTimeouts#runWithRetry . java.net.SocksSocketImpl is possible to throw SocketTimeoutException with null message. We seem not to be able to expect that SocketTimeoutException always contains message such as "Read timed out" or "connect timed out". java.lang.AssertionError: Expected to find ': Read timed out' but got unexpected exception:java.net.SocketTimeoutException at java.net.SocksSocketImpl.remainingMillis(SocksSocketImpl.java:111) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.<init>(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:684) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:637) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:709) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:555) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:586) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:582) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1466) at org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts.testAuthUrlReadTimeout(TestWebHdfsTimeouts.java:198)
        Hide
        iwasakims Masatake Iwasaki added a comment -
          private static final int SHORT_SOCKET_TIMEOUT = 5;
        

        I had to decrease the value of SHORT_SOCKET_TIMEOUT to reproduce the issue in a few tries on my environment. Maybe just increasing the value to 20 or 30 is enough to avoid the issue even on heavily loaded build servers.

        Show
        iwasakims Masatake Iwasaki added a comment - private static final int SHORT_SOCKET_TIMEOUT = 5; I had to decrease the value of SHORT_SOCKET_TIMEOUT to reproduce the issue in a few tries on my environment. Maybe just increasing the value to 20 or 30 is enough to avoid the issue even on heavily loaded build servers.
        Hide
        kihwal Kihwal Lee added a comment -

        Eric Payne, do you by any chance have any idea about this test failures?

        Show
        kihwal Kihwal Lee added a comment - Eric Payne , do you by any chance have any idea about this test failures?
        Hide
        iwasakims Masatake Iwasaki added a comment -

        s/TestWebHdfsTimeouts#runWithRetry/WebHdfsTimeouts#runWithRetry/

        Show
        iwasakims Masatake Iwasaki added a comment - s/TestWebHdfsTimeouts#runWithRetry/WebHdfsTimeouts#runWithRetry/
        Hide
        iwasakims Masatake Iwasaki added a comment -

        ... s/TestWebHdfsTimeouts#runWithRetry/WebHdfsFileSystem#runWithRetry/

        Show
        iwasakims Masatake Iwasaki added a comment - ... s/TestWebHdfsTimeouts#runWithRetry/WebHdfsFileSystem#runWithRetry/
        Hide
        eepayne Eric Payne added a comment -

        java.net.SocksSocketImpl is possible to throw SocketTimeoutException with null message. We seem not to be able to expect that SocketTimeoutException always contains message such as "Read timed out" or "connect timed out".

        Use GenericTestUtils.assertExceptionContains instead of Assert.assertEquals so that if the string doesn't match, it logs the exception.

        Thanks, Masatake Iwasaki and Wei-Chiu Chuang for your work on this issue. I don't know what would cause SocketTimeoutException to give a null message instead of the expected Read timed out. However, your point about the original stack trace being lost is a very good one:

        the exception object was reinterpreted in the exception handling, so the original stack trace was lost.

        In WebHdfsFileSystem#AbstractRunner#runWithRetry, the code that recreates the exception with the node name should also propagate the stack trace:

                      ioe = ioe.getClass().getConstructor(String.class)
                            .newInstance(node + ": " + ioe.getMessage());
        

        Should be:

                      IOException newIoe =
                          ioe.getClass().getConstructor(String.class)
                            .newInstance(node + ": " + ioe.getMessage());
                      newIoe.setStackTrace(ioe.getStackTrace());
                      ioe = newIoe;
        

        I can open a separate JIRA for this if you want.

        Show
        eepayne Eric Payne added a comment - java.net.SocksSocketImpl is possible to throw SocketTimeoutException with null message. We seem not to be able to expect that SocketTimeoutException always contains message such as "Read timed out" or "connect timed out". Use GenericTestUtils.assertExceptionContains instead of Assert.assertEquals so that if the string doesn't match, it logs the exception. Thanks, Masatake Iwasaki and Wei-Chiu Chuang for your work on this issue. I don't know what would cause SocketTimeoutException to give a null message instead of the expected Read timed out . However, your point about the original stack trace being lost is a very good one: the exception object was reinterpreted in the exception handling, so the original stack trace was lost. In WebHdfsFileSystem#AbstractRunner#runWithRetry , the code that recreates the exception with the node name should also propagate the stack trace: ioe = ioe.getClass().getConstructor( String .class) .newInstance(node + ": " + ioe.getMessage()); Should be: IOException newIoe = ioe.getClass().getConstructor( String .class) .newInstance(node + ": " + ioe.getMessage()); newIoe.setStackTrace(ioe.getStackTrace()); ioe = newIoe; I can open a separate JIRA for this if you want.
        Hide
        iwasakims Masatake Iwasaki added a comment -

        I don't know what would cause SocketTimeoutException to give a null message instead of the expected Read timed out.

        Though underlying implementation of PlainSocketImpl and SocketInputStream throws SocketTimeoutException with expected message, SocketTimeoutException without message could be thrown by SocksSocketImpl#remainingMillis before reaching to those code paths if connect timeout is set to very small value.

        I'm +1 on the fix of WebHdfsFileSystem#AbstractRunner#runWithRetry suggested by Eric Payne in addition to 001.

        Show
        iwasakims Masatake Iwasaki added a comment - I don't know what would cause SocketTimeoutException to give a null message instead of the expected Read timed out. Though underlying implementation of PlainSocketImpl and SocketInputStream throws SocketTimeoutException with expected message, SocketTimeoutException without message could be thrown by SocksSocketImpl#remainingMillis before reaching to those code paths if connect timeout is set to very small value. I'm +1 on the fix of WebHdfsFileSystem#AbstractRunner#runWithRetry suggested by Eric Payne in addition to 001.
        Hide
        iwasakims Masatake Iwasaki added a comment -

        Wei-Chiu Chuang, can you update the patch to address comments above?

        Show
        iwasakims Masatake Iwasaki added a comment - Wei-Chiu Chuang , can you update the patch to address comments above?
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Posted patch v02. Thanks Eric Payne and Masatake Iwasaki.
        Tested local against all WebHdfs tests.

        I've not yet been able to reproduce the test failure in TestWebHdfsTimeouts though.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Posted patch v02. Thanks Eric Payne and Masatake Iwasaki . Tested local against all WebHdfs tests. I've not yet been able to reproduce the test failure in TestWebHdfsTimeouts though.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 12s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        0 mvndep 0m 21s Maven dependency ordering for branch
        +1 mvninstall 6m 43s trunk passed
        +1 compile 1m 20s trunk passed with JDK v1.8.0_77
        +1 compile 1m 20s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 27s trunk passed
        +1 mvnsite 1m 24s trunk passed
        +1 mvneclipse 0m 25s trunk passed
        +1 findbugs 3m 39s trunk passed
        +1 javadoc 1m 26s trunk passed with JDK v1.8.0_77
        +1 javadoc 2m 9s trunk passed with JDK v1.7.0_95
        0 mvndep 0m 9s Maven dependency ordering for patch
        +1 mvninstall 1m 18s the patch passed
        +1 compile 1m 14s the patch passed with JDK v1.8.0_77
        +1 javac 1m 14s the patch passed
        +1 compile 1m 20s the patch passed with JDK v1.7.0_95
        +1 javac 1m 20s the patch passed
        +1 checkstyle 0m 24s hadoop-hdfs-project: patch generated 0 new + 60 unchanged - 1 fixed = 60 total (was 61)
        +1 mvnsite 1m 20s the patch passed
        +1 mvneclipse 0m 22s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 3m 59s the patch passed
        +1 javadoc 1m 24s the patch passed with JDK v1.8.0_77
        +1 javadoc 2m 8s the patch passed with JDK v1.7.0_95
        +1 unit 0m 49s hadoop-hdfs-client in the patch passed with JDK v1.8.0_77.
        -1 unit 58m 38s hadoop-hdfs in the patch failed with JDK v1.8.0_77.
        +1 unit 0m 57s hadoop-hdfs-client in the patch passed with JDK v1.7.0_95.
        -1 unit 54m 51s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
        +1 asflicense 0m 20s Patch does not generate ASF License warnings.
        151m 0s



        Reason Tests
        JDK v1.8.0_77 Failed junit tests hadoop.hdfs.TestDFSShell
          hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
        JDK v1.7.0_95 Failed junit tests hadoop.hdfs.TestDFSUpgradeFromImage
          hadoop.hdfs.TestHFlush
          hadoop.hdfs.TestDFSShell



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:fbe3e86
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12800318/HDFS-9905.002.patch
        JIRA Issue HDFS-9905
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 9aaaaad12591 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 63e5412
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/15258/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/15258/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15258/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15258/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15258/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15258/console
        Powered by Apache Yetus 0.2.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 21s Maven dependency ordering for branch +1 mvninstall 6m 43s trunk passed +1 compile 1m 20s trunk passed with JDK v1.8.0_77 +1 compile 1m 20s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 27s trunk passed +1 mvnsite 1m 24s trunk passed +1 mvneclipse 0m 25s trunk passed +1 findbugs 3m 39s trunk passed +1 javadoc 1m 26s trunk passed with JDK v1.8.0_77 +1 javadoc 2m 9s trunk passed with JDK v1.7.0_95 0 mvndep 0m 9s Maven dependency ordering for patch +1 mvninstall 1m 18s the patch passed +1 compile 1m 14s the patch passed with JDK v1.8.0_77 +1 javac 1m 14s the patch passed +1 compile 1m 20s the patch passed with JDK v1.7.0_95 +1 javac 1m 20s the patch passed +1 checkstyle 0m 24s hadoop-hdfs-project: patch generated 0 new + 60 unchanged - 1 fixed = 60 total (was 61) +1 mvnsite 1m 20s the patch passed +1 mvneclipse 0m 22s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 3m 59s the patch passed +1 javadoc 1m 24s the patch passed with JDK v1.8.0_77 +1 javadoc 2m 8s the patch passed with JDK v1.7.0_95 +1 unit 0m 49s hadoop-hdfs-client in the patch passed with JDK v1.8.0_77. -1 unit 58m 38s hadoop-hdfs in the patch failed with JDK v1.8.0_77. +1 unit 0m 57s hadoop-hdfs-client in the patch passed with JDK v1.7.0_95. -1 unit 54m 51s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 151m 0s Reason Tests JDK v1.8.0_77 Failed junit tests hadoop.hdfs.TestDFSShell   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl JDK v1.7.0_95 Failed junit tests hadoop.hdfs.TestDFSUpgradeFromImage   hadoop.hdfs.TestHFlush   hadoop.hdfs.TestDFSShell Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12800318/HDFS-9905.002.patch JIRA Issue HDFS-9905 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9aaaaad12591 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 63e5412 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/15258/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15258/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15258/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15258/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15258/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15258/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
        Hide
        iwasakims Masatake Iwasaki added a comment -

        Thanks for the update, Wei-Chiu Chuang. +1.

        I updated the title and description of the issue. If TestWebHdfsTimeouts still fails on build servers, we can see the root cause and fix it in follow-up JIRA.

        Show
        iwasakims Masatake Iwasaki added a comment - Thanks for the update, Wei-Chiu Chuang . +1. I updated the title and description of the issue. If TestWebHdfsTimeouts still fails on build servers, we can see the root cause and fix it in follow-up JIRA.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9661 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9661/)
        HDFS-9905. WebHdfsFileSystem#runWithRetry should display original stack (iwasakims: rev 6fcde2e38da04cae3aad6b13cf442af211f71506)

        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTimeouts.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9661 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9661/ ) HDFS-9905 . WebHdfsFileSystem#runWithRetry should display original stack (iwasakims: rev 6fcde2e38da04cae3aad6b13cf442af211f71506) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTimeouts.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
        Hide
        iwasakims Masatake Iwasaki added a comment -

        Committed. Thanks, Wei-Chiu Chuang and Eric Payne.

        Show
        iwasakims Masatake Iwasaki added a comment - Committed. Thanks, Wei-Chiu Chuang and Eric Payne .
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Thanks to Kihwal Lee for reporting it, Masatake Iwasaki for committing it and Eric Payne for suggestions!

        Show
        jojochuang Wei-Chiu Chuang added a comment - Thanks to Kihwal Lee for reporting it, Masatake Iwasaki for committing it and Eric Payne for suggestions!
        Hide
        eepayne Eric Payne added a comment -

        HDFS-7163, which introduced this problem, was originally merged back to 2.7, but this JIRA (HDFS-9905), was not. The misleading stack trace is causing difficulty in debugging problems for us, so I cherry-picked this back to 2.7:

        Show
        eepayne Eric Payne added a comment - HDFS-7163 , which introduced this problem, was originally merged back to 2.7, but this JIRA ( HDFS-9905 ), was not. The misleading stack trace is causing difficulty in debugging problems for us, so I cherry-picked this back to 2.7:
        Hide
        brahmareddy Brahma Reddy Battula added a comment -

        Note: it's committed to 2.7.3 and CHANGES.txt is not updated.Bytheway I updated the fix version with 2.7.3.

        Show
        brahmareddy Brahma Reddy Battula added a comment - Note: it's committed to 2.7.3 and CHANGES.txt is not updated.Bytheway I updated the fix version with 2.7.3 .

          People

          • Assignee:
            jojochuang Wei-Chiu Chuang
            Reporter:
            kihwal Kihwal Lee
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development