Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6308

TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      Found this on pre-commit build of HDFS-6261

      java.lang.AssertionError: Expected one valid and one invalid volume
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.assertTrue(Assert.java:41)
      	at org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837)
      

        Activity

        Hide
        Binglin Chang added a comment -

        Related error log:

        2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(197)) - 1418: Call -> /127.0.0.1:58789: getHdfsBlockLocations {tokens { identifier: "" password: "" kind: "" service: "" } tokens { identifier: "" password: "" kind: "" service: "" } blockPoolId: "BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds: 1073741826}
        2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(197)) - 1419: Call -> /127.0.0.1:45933: getHdfsBlockLocations {tokens { identifier: "" password: "" kind: "" service: "" } tokens { identifier: "" password: "" kind: "" service: "" } blockPoolId: "BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds: 1073741826}
        2014-04-28 05:18:19,701 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1418: Exception <- localhost/127.0.0.1:58789: getHdfsBlockLocations {java.net.ConnectException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:58789 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused}
        2014-04-28 05:18:19,701 INFO  ipc.Server (Server.java:doRead(762)) - Socket Reader #1 for port 45933: readAndProcess from client 127.0.0.1 threw exception [java.io.IOException: Connection reset by peer]
        java.io.IOException: Connection reset by peer
        	at sun.nio.ch.FileDispatcher.read0(Native Method)
        	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
        	at sun.nio.ch.IOUtil.read(IOUtil.java:171)
        	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
        	at org.apache.hadoop.ipc.Server.channelRead(Server.java:2644)
        	at org.apache.hadoop.ipc.Server.access$2800(Server.java:133)
        	at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1517)
        	at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:753)
        	at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:627)
        	at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:598)
        2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1419: Exception <- /127.0.0.1:45933: getHdfsBlockLocations {java.net.SocketTimeoutException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 remote=/127.0.0.1:45933]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout}
        2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1415: Exception <- localhost/127.0.0.1:45933: getHdfsBlockLocations {java.net.SocketTimeoutException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 remote=/127.0.0.1:45933]; For more details see:  
        

        socket read/write timeout is set to 1500ms, timeout error is global(per connection), so when timeout occurs, all calls in this connection are marked timeout, but the expected behavior should be: first call timeout, second call normal.

        There is a simple fix, just invoke second call after the connection is closed for sure.

        We can consider improving ipc.Client to prevent this kind of corner case later.

        Show
        Binglin Chang added a comment - Related error log: 2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(197)) - 1418: Call -> /127.0.0.1:58789: getHdfsBlockLocations {tokens { identifier: "" password: " " kind: " " service: " " } tokens { identifier: " " password: " " kind: " " service: " " } blockPoolId: " BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds: 1073741826} 2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(197)) - 1419: Call -> /127.0.0.1:45933: getHdfsBlockLocations {tokens { identifier: "" password: " " kind: " " service: " " } tokens { identifier: " " password: " " kind: " " service: " " } blockPoolId: " BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds: 1073741826} 2014-04-28 05:18:19,701 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1418: Exception <- localhost/127.0.0.1:58789: getHdfsBlockLocations {java.net.ConnectException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:58789 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http: //wiki.apache.org/hadoop/ConnectionRefused} 2014-04-28 05:18:19,701 INFO ipc.Server (Server.java:doRead(762)) - Socket Reader #1 for port 45933: readAndProcess from client 127.0.0.1 threw exception [java.io.IOException: Connection reset by peer] java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:171) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:2644) at org.apache.hadoop.ipc.Server.access$2800(Server.java:133) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1517) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:753) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:627) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:598) 2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1419: Exception <- /127.0.0.1:45933: getHdfsBlockLocations {java.net.SocketTimeoutException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 remote=/127.0.0.1:45933]; For more details see: http: //wiki.apache.org/hadoop/SocketTimeout} 2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1415: Exception <- localhost/127.0.0.1:45933: getHdfsBlockLocations {java.net.SocketTimeoutException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 remote=/127.0.0.1:45933]; For more details see: socket read/write timeout is set to 1500ms, timeout error is global(per connection), so when timeout occurs, all calls in this connection are marked timeout, but the expected behavior should be: first call timeout, second call normal. There is a simple fix, just invoke second call after the connection is closed for sure. We can consider improving ipc.Client to prevent this kind of corner case later.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12642622/HDFS-6308.v1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6772//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6772//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642622/HDFS-6308.v1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6772//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6772//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12642622/HDFS-6308.v1.patch
        against trunk revision db73cc9.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 287 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8969//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8969//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8969//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642622/HDFS-6308.v1.patch against trunk revision db73cc9. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 287 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8969//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8969//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8969//console This message is automatically generated.

          People

          • Assignee:
            Binglin Chang
            Reporter:
            Binglin Chang
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development