Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6308

TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Found this on pre-commit build of HDFS-6261

      java.lang.AssertionError: Expected one valid and one invalid volume
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.assertTrue(Assert.java:41)
      	at org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837)
      

        Activity

        Hide
        decster Binglin Chang added a comment -

        Related error log:

        2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(197)) - 1418: Call -> /127.0.0.1:58789: getHdfsBlockLocations {tokens { identifier: "" password: "" kind: "" service: "" } tokens { identifier: "" password: "" kind: "" service: "" } blockPoolId: "BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds: 1073741826}
        2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(197)) - 1419: Call -> /127.0.0.1:45933: getHdfsBlockLocations {tokens { identifier: "" password: "" kind: "" service: "" } tokens { identifier: "" password: "" kind: "" service: "" } blockPoolId: "BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds: 1073741826}
        2014-04-28 05:18:19,701 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1418: Exception <- localhost/127.0.0.1:58789: getHdfsBlockLocations {java.net.ConnectException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:58789 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused}
        2014-04-28 05:18:19,701 INFO  ipc.Server (Server.java:doRead(762)) - Socket Reader #1 for port 45933: readAndProcess from client 127.0.0.1 threw exception [java.io.IOException: Connection reset by peer]
        java.io.IOException: Connection reset by peer
        	at sun.nio.ch.FileDispatcher.read0(Native Method)
        	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
        	at sun.nio.ch.IOUtil.read(IOUtil.java:171)
        	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
        	at org.apache.hadoop.ipc.Server.channelRead(Server.java:2644)
        	at org.apache.hadoop.ipc.Server.access$2800(Server.java:133)
        	at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1517)
        	at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:753)
        	at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:627)
        	at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:598)
        2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1419: Exception <- /127.0.0.1:45933: getHdfsBlockLocations {java.net.SocketTimeoutException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 remote=/127.0.0.1:45933]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout}
        2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1415: Exception <- localhost/127.0.0.1:45933: getHdfsBlockLocations {java.net.SocketTimeoutException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 remote=/127.0.0.1:45933]; For more details see:  
        

        socket read/write timeout is set to 1500ms, timeout error is global(per connection), so when timeout occurs, all calls in this connection are marked timeout, but the expected behavior should be: first call timeout, second call normal.

        There is a simple fix, just invoke second call after the connection is closed for sure.

        We can consider improving ipc.Client to prevent this kind of corner case later.

        Show
        decster Binglin Chang added a comment - Related error log: 2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(197)) - 1418: Call -> /127.0.0.1:58789: getHdfsBlockLocations {tokens { identifier: "" password: " " kind: " " service: " " } tokens { identifier: " " password: " " kind: " " service: " " } blockPoolId: " BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds: 1073741826} 2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(197)) - 1419: Call -> /127.0.0.1:45933: getHdfsBlockLocations {tokens { identifier: "" password: " " kind: " " service: " " } tokens { identifier: " " password: " " kind: " " service: " " } blockPoolId: " BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds: 1073741826} 2014-04-28 05:18:19,701 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1418: Exception <- localhost/127.0.0.1:58789: getHdfsBlockLocations {java.net.ConnectException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:58789 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http: //wiki.apache.org/hadoop/ConnectionRefused} 2014-04-28 05:18:19,701 INFO ipc.Server (Server.java:doRead(762)) - Socket Reader #1 for port 45933: readAndProcess from client 127.0.0.1 threw exception [java.io.IOException: Connection reset by peer] java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:171) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:2644) at org.apache.hadoop.ipc.Server.access$2800(Server.java:133) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1517) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:753) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:627) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:598) 2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1419: Exception <- /127.0.0.1:45933: getHdfsBlockLocations {java.net.SocketTimeoutException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 remote=/127.0.0.1:45933]; For more details see: http: //wiki.apache.org/hadoop/SocketTimeout} 2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1415: Exception <- localhost/127.0.0.1:45933: getHdfsBlockLocations {java.net.SocketTimeoutException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 remote=/127.0.0.1:45933]; For more details see: socket read/write timeout is set to 1500ms, timeout error is global(per connection), so when timeout occurs, all calls in this connection are marked timeout, but the expected behavior should be: first call timeout, second call normal. There is a simple fix, just invoke second call after the connection is closed for sure. We can consider improving ipc.Client to prevent this kind of corner case later.
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12642622/HDFS-6308.v1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6772//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6772//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642622/HDFS-6308.v1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6772//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6772//console This message is automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12642622/HDFS-6308.v1.patch
        against trunk revision db73cc9.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 287 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8969//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8969//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8969//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642622/HDFS-6308.v1.patch against trunk revision db73cc9. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 287 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8969//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8969//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8969//console This message is automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        -1 patch 0m 5s HDFS-6308 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



        Subsystem Report/Notes
        JIRA Issue HDFS-6308
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12642622/HDFS-6308.v1.patch
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18067/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 5s HDFS-6308 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HDFS-6308 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12642622/HDFS-6308.v1.patch Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18067/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        -1 patch 0m 5s HDFS-6308 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



        Subsystem Report/Notes
        JIRA Issue HDFS-6308
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12642622/HDFS-6308.v1.patch
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/21369/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 5s HDFS-6308 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HDFS-6308 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12642622/HDFS-6308.v1.patch Console output https://builds.apache.org/job/PreCommit-HDFS-Build/21369/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

          People

          • Assignee:
            decster Binglin Chang
            Reporter:
            decster Binglin Chang
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development