Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.0.2-alpha
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Following issues in DFSInputStream are addressed in this jira:
      1. read may not retry enough in some cases cause early failure
      Assume the following call logic

       
      readWithStrategy()
        -> blockSeekTo()
        -> readBuffer()
           -> reader.doRead()
           -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
              -> blockSeekTo()
                 -> chooseDataNode()
                    -> block missing, clear deadNodes and pick the currentNode again
              seekToNewSource() return false
           readBuffer() re-throw the exception quit loop
      readWithStrategy() got the exception,  and may fail the read call before tried MaxBlockAcquireFailures.
      

      2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it is cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit. Change failures to local variable solve this issue.

      3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode is become live.

      1. HDFS-4273.v8.patch
        27 kB
        Binglin Chang
      2. HDFS-4273.v7.patch
        29 kB
        Binglin Chang
      3. HDFS-4273.v6.patch
        27 kB
        Binglin Chang
      4. HDFS-4273.v5.patch
        27 kB
        Binglin Chang
      5. HDFS-4273.v4.patch
        27 kB
        Binglin Chang
      6. HDFS-4273.v3.patch
        26 kB
        Binglin Chang
      7. HDFS-4273-v2.patch
        26 kB
        Binglin Chang
      8. HDFS-4273.patch
        26 kB
        Binglin Chang
      9. TestDFSInputStream.java
        4 kB
        Binglin Chang

        Issue Links

          Activity

          Binglin Chang created issue -
          Hide
          Binglin Chang added a comment -

          I write a test to produce the scenario, and here is the log:

          2012-12-05 22:55:15,135 WARN  hdfs.DFSClient (DFSInputStream.java:readBuffer(596)) - Found Checksum error for BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002 from 127.0.0.1:50099 at 0
          2012-12-05 22:55:15,136 INFO  DataNode.clienttrace (BlockSender.java:sendBlock(672)) - src: /127.0.0.1:50099, dest: /127.0.0.1:50105, bytes: 4128, op: HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1488457569_1, offset: 0, srvID: DS-91625336-192.168.0.101-50099-1354719314603, blockid: BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002, duration: 2925000
          2012-12-05 22:55:15,136 INFO  hdfs.DFSClient (DFSInputStream.java:chooseDataNode(741)) - Could not obtain BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...
          2012-12-05 22:55:15,136 WARN  hdfs.DFSClient (DFSInputStream.java:chooseDataNode(756)) - DFS chooseDataNode: got # 1 IOException, will wait for 274.34891931868265 msec.
          2012-12-05 22:55:15,413 INFO  DataNode.clienttrace (BlockSender.java:sendBlock(672)) - src: /127.0.0.1:50099, dest: /127.0.0.1:50106, bytes: 4128, op: HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1488457569_1, offset: 0, srvID: DS-91625336-192.168.0.101-50099-1354719314603, blockid: BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002, duration: 283000
          2012-12-05 22:55:15,414 INFO  hdfs.StateChange (FSNamesystem.java:reportBadBlocks(4761)) - *DIR* reportBadBlocks
          2012-12-05 22:55:15,415 INFO  BlockStateChange (CorruptReplicasMap.java:addToCorruptReplicasMap(66)) - BLOCK NameSystem.addToCorruptReplicasMap: blk_-705068286766485620 added as corrupt on 127.0.0.1:50099 by null because client machine reported it
          2012-12-05 22:55:15,416 INFO  hdfs.TestClientReportBadBlock (TestDFSInputStream.java:testDFSInputStreamReadRetryTime(94)) - catch IOExceptionorg.apache.hadoop.fs.ChecksumException: Checksum error: /testFile at 0 exp: 809972010 got: -1374622118
          2012-12-05 22:55:15,431 INFO  hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1411)) - Shutting down the Mini HDFS Cluster
          
          Show
          Binglin Chang added a comment - I write a test to produce the scenario, and here is the log: 2012-12-05 22:55:15,135 WARN hdfs.DFSClient (DFSInputStream.java:readBuffer(596)) - Found Checksum error for BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002 from 127.0.0.1:50099 at 0 2012-12-05 22:55:15,136 INFO DataNode.clienttrace (BlockSender.java:sendBlock(672)) - src: /127.0.0.1:50099, dest: /127.0.0.1:50105, bytes: 4128, op: HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1488457569_1, offset: 0, srvID: DS-91625336-192.168.0.101-50099-1354719314603, blockid: BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002, duration: 2925000 2012-12-05 22:55:15,136 INFO hdfs.DFSClient (DFSInputStream.java:chooseDataNode(741)) - Could not obtain BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry... 2012-12-05 22:55:15,136 WARN hdfs.DFSClient (DFSInputStream.java:chooseDataNode(756)) - DFS chooseDataNode: got # 1 IOException, will wait for 274.34891931868265 msec. 2012-12-05 22:55:15,413 INFO DataNode.clienttrace (BlockSender.java:sendBlock(672)) - src: /127.0.0.1:50099, dest: /127.0.0.1:50106, bytes: 4128, op: HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1488457569_1, offset: 0, srvID: DS-91625336-192.168.0.101-50099-1354719314603, blockid: BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002, duration: 283000 2012-12-05 22:55:15,414 INFO hdfs.StateChange (FSNamesystem.java:reportBadBlocks(4761)) - *DIR* reportBadBlocks 2012-12-05 22:55:15,415 INFO BlockStateChange (CorruptReplicasMap.java:addToCorruptReplicasMap(66)) - BLOCK NameSystem.addToCorruptReplicasMap: blk_-705068286766485620 added as corrupt on 127.0.0.1:50099 by null because client machine reported it 2012-12-05 22:55:15,416 INFO hdfs.TestClientReportBadBlock (TestDFSInputStream.java:testDFSInputStreamReadRetryTime(94)) - catch IOExceptionorg.apache.hadoop.fs.ChecksumException: Checksum error: /testFile at 0 exp: 809972010 got: -1374622118 2012-12-05 22:55:15,431 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1411)) - Shutting down the Mini HDFS Cluster
          Binglin Chang made changes -
          Field Original Value New Value
          Attachment TestDFSInputStream.java [ 12556103 ]
          Hide
          Suresh Srinivas added a comment -

          You have HDFS-4271 and HDFS-4272 with similar description. Are they accidentally created?

          Show
          Suresh Srinivas added a comment - You have HDFS-4271 and HDFS-4272 with similar description. Are they accidentally created?
          Hide
          Binglin Chang added a comment -

          You have HDFS-4271 and HDFS-4272 with similar description. Are they accidentally created?

          Yes, bad Internet connection. I have closed the other two.

          Show
          Binglin Chang added a comment - You have HDFS-4271 and HDFS-4272 with similar description. Are they accidentally created? Yes, bad Internet connection. I have closed the other two.
          Hide
          Binglin Chang added a comment -

          And here seams have another issue:

            /**
             * This variable tracks the number of failures since the start of the
             * most recent user-facing operation. That is to say, it should be reset
             * whenever the user makes a call on this stream, and if at any point
             * during the retry logic, the failure count exceeds a threshold,
             * the errors will be thrown back to the operation.
             *
             * Specifically this counts the number of times the client has gone
             * back to the namenode to get a new list of block locations, and is
             * capped at maxBlockAcquireFailures
             */
            private int failures = 0;
          ......
            public int read(long position, byte[] buffer, int offset, int length)
              throws IOException {
              // sanity checks
              dfsClient.checkOpen();
              if (closed) {
                throw new IOException("Stream closed");
              }
              failures = 0;
          
          

          every concurrent "pread" call will reset failures, so if there always have new pread calls, seems all pread will loop infinitely?
          I am assuming int read(long position, byte[] buffer, int offset, int length) is intended for concurrent use? If so, failures should be a local variable.

          Show
          Binglin Chang added a comment - And here seams have another issue: /** * This variable tracks the number of failures since the start of the * most recent user-facing operation. That is to say, it should be reset * whenever the user makes a call on this stream, and if at any point * during the retry logic, the failure count exceeds a threshold, * the errors will be thrown back to the operation. * * Specifically this counts the number of times the client has gone * back to the namenode to get a new list of block locations, and is * capped at maxBlockAcquireFailures */ private int failures = 0; ...... public int read( long position, byte [] buffer, int offset, int length) throws IOException { // sanity checks dfsClient.checkOpen(); if (closed) { throw new IOException( "Stream closed" ); } failures = 0; every concurrent "pread" call will reset failures, so if there always have new pread calls, seems all pread will loop infinitely? I am assuming int read(long position, byte[] buffer, int offset, int length) is intended for concurrent use? If so, failures should be a local variable.
          Hide
          Jing Zhao added a comment -

          I do not think the DFSInputstream#read will be used concurrently. Thus the failure variable reset should be correct. Also, to clear deadNodes looks reasonable especially when you have multiple replications (since at that you do not have any candidate nodes to try and some previous temporary "deaths" may already have been recovered).

          Show
          Jing Zhao added a comment - I do not think the DFSInputstream#read will be used concurrently. Thus the failure variable reset should be correct. Also, to clear deadNodes looks reasonable especially when you have multiple replications (since at that you do not have any candidate nodes to try and some previous temporary "deaths" may already have been recovered).
          Hide
          Binglin Chang added a comment -

          I do not think the DFSInputstream#read will be used concurrently.

          Yes "read(byte [] buffer, int offset, int length)" will not be used concurrently. but according the classic semantics of pread "int read(long position, byte[] buffer, int offset, int length)", and no "synchronized" implication, I thought hbase may be using pread concurrently, right?

          Also, to clear deadNodes looks reasonable...

          I'm not against clear deadNodes and retry, on the contrary, the current logic makes it may not retry enough and quit early.

          Show
          Binglin Chang added a comment - I do not think the DFSInputstream#read will be used concurrently. Yes "read(byte [] buffer, int offset, int length)" will not be used concurrently. but according the classic semantics of pread "int read(long position, byte[] buffer, int offset, int length)", and no "synchronized" implication, I thought hbase may be using pread concurrently, right? Also, to clear deadNodes looks reasonable... I'm not against clear deadNodes and retry, on the contrary, the current logic makes it may not retry enough and quit early.
          Binglin Chang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Binglin Chang added a comment -

          I am referring read() as int readWithStrategy(ReaderStrategy strategy, int off, int len)
          and pread() as int read(long position, byte[] buffer, int offset, int length)

          Changes:
          1. Add new argument "dislike" to chooseDatanode() and bestNode(), so to fix seekToNewSource.
          2. Make failures to be local variable, so pread can be thread-safe
          2. In read(), make outer layer to handle BlockMissingException, bypassing seekToNewSource
          3. Remove read retries, cause there is already MaxBlockAcquireFailures to handle retry
          4. Throw ChecksumException iff we have tried enough times and there is only one replica available.
          In original logic, the throwing of ChecksumException or BlockMissing is somehow random, depending the order of the locations of getLocatedBlocks().
          Another alternative is change it to always throw BlockMissingException(like pread behavior), but it breaks current test cases.
          5. In pread(), modify code to follow the same retry logic as read().
          Notice that the exception behavior of read() and pread() is not same currently:
          read() sometimes throw ChecksumException, pread() never throw ChecksumException. The current patch remain the same behavior.
          6. Add sanity checks for seek and seekToNewSource
          7. Add test to check DFSInputStream tried MaxBlockAcquireFailures under error
          8. Add the same test cases to check seekToNewSource as the original test cases to check seek

          Show
          Binglin Chang added a comment - I am referring read() as int readWithStrategy(ReaderStrategy strategy, int off, int len) and pread() as int read(long position, byte[] buffer, int offset, int length) Changes: 1. Add new argument "dislike" to chooseDatanode() and bestNode(), so to fix seekToNewSource. 2. Make failures to be local variable, so pread can be thread-safe 2. In read(), make outer layer to handle BlockMissingException, bypassing seekToNewSource 3. Remove read retries, cause there is already MaxBlockAcquireFailures to handle retry 4. Throw ChecksumException iff we have tried enough times and there is only one replica available. In original logic, the throwing of ChecksumException or BlockMissing is somehow random, depending the order of the locations of getLocatedBlocks(). Another alternative is change it to always throw BlockMissingException(like pread behavior), but it breaks current test cases. 5. In pread(), modify code to follow the same retry logic as read(). Notice that the exception behavior of read() and pread() is not same currently: read() sometimes throw ChecksumException, pread() never throw ChecksumException. The current patch remain the same behavior. 6. Add sanity checks for seek and seekToNewSource 7. Add test to check DFSInputStream tried MaxBlockAcquireFailures under error 8. Add the same test cases to check seekToNewSource as the original test cases to check seek
          Binglin Chang made changes -
          Attachment HDFS-4273.patch [ 12556273 ]
          Hide
          Binglin Chang added a comment -

          new version, fix bug – forget to set blockEnd to -1 in two places: seekToNewSource() and read()

          Show
          Binglin Chang added a comment - new version, fix bug – forget to set blockEnd to -1 in two places: seekToNewSource() and read()
          Binglin Chang made changes -
          Attachment HDFS-4273-v2.patch [ 12556369 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12556369/HDFS-4273-v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.TestEditLog

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3613//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3613//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12556369/HDFS-4273-v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestEditLog +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3613//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3613//console This message is automatically generated.
          Hide
          Binglin Chang added a comment -

          the failed test is caused by HDFS-4282

          Show
          Binglin Chang added a comment - the failed test is caused by HDFS-4282
          Hide
          Binglin Chang added a comment -

          Anyone familiar this part of code, could you please give some input for this issue? I'm reviewing DFSClient code to try to evaluate the complexity of reimplementing a native HDFS client and found some issues of the code(not only this one), at least it will make hdfs more robust.

          Show
          Binglin Chang added a comment - Anyone familiar this part of code, could you please give some input for this issue? I'm reviewing DFSClient code to try to evaluate the complexity of reimplementing a native HDFS client and found some issues of the code(not only this one), at least it will make hdfs more robust.
          Hide
          Luke Lu added a comment -

          HBase definitely uses the positional read API (at least since 0.92 with HFile v2, see HFileBlock.java). OTOH, since the private retries is not volatile, it might appear to be thread local in many JVMs to avoid the infinite loop by concurrent preads in this case

          The patch fix the retry logic and lgtm. Maybe "avoid" is better than "dislike"?

          Show
          Luke Lu added a comment - HBase definitely uses the positional read API (at least since 0.92 with HFile v2, see HFileBlock.java). OTOH, since the private retries is not volatile, it might appear to be thread local in many JVMs to avoid the infinite loop by concurrent preads in this case The patch fix the retry logic and lgtm. Maybe "avoid" is better than "dislike"?
          Binglin Chang made changes -
          Affects Version/s 3.0.0 [ 12320356 ]
          Affects Version/s 2.0.3-alpha [ 12323274 ]
          Target Version/s 2.0.3-alpha [ 12323274 ]
          Binglin Chang made changes -
          Target Version/s 2.0.3-alpha [ 12323274 ] 3.0.0, 2.0.3-alpha [ 12320356, 12323274 ]
          Binglin Chang made changes -
          Affects Version/s 2.0.2-alpha [ 12322472 ]
          Affects Version/s 3.0.0 [ 12320356 ]
          Affects Version/s 2.0.3-alpha [ 12323274 ]
          Hide
          Binglin Chang added a comment -

          thanks for the review Luke!
          Here is new version:
          change dislike to avoid

          Show
          Binglin Chang added a comment - thanks for the review Luke! Here is new version: change dislike to avoid
          Binglin Chang made changes -
          Attachment HDFS-4273.v3.patch [ 12562449 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562449/HDFS-4273.v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3696//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3696//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562449/HDFS-4273.v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3696//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3696//console This message is automatically generated.
          Hide
          Luke Lu added a comment -

          Looks like the patch has gone stale...

          Show
          Luke Lu added a comment - Looks like the patch has gone stale...
          Hide
          Binglin Chang added a comment -

          @Luke, Sorry for the late, attaching latest version patch

          Show
          Binglin Chang added a comment - @Luke, Sorry for the late, attaching latest version patch
          Binglin Chang made changes -
          Attachment HDFS-4273.v4.patch [ 12614872 ]
          Hide
          Vinayakumar B added a comment -

          Patch looks good Binglin,
          Only one small nit.

          duplicate closed check is done in seekToNewSource

          +    if (closed) {
          +      throw new IOException("Stream is closed!");
          +    }

          +1, once this is addressed. Lets wait for the jenkins +1 too.

          Show
          Vinayakumar B added a comment - Patch looks good Binglin, Only one small nit. duplicate closed check is done in seekToNewSource + if (closed) { + throw new IOException( "Stream is closed!" ); + } +1, once this is addressed. Lets wait for the jenkins +1 too.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12614872/HDFS-4273.v4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestShortCircuitLocalRead

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5502//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5502//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614872/HDFS-4273.v4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestShortCircuitLocalRead +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5502//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5502//console This message is automatically generated.
          Hide
          Binglin Chang added a comment -

          Thanks for the review Vinay! Attach new patch addressing your comments and fix the test timeout.

          Show
          Binglin Chang added a comment - Thanks for the review Vinay! Attach new patch addressing your comments and fix the test timeout.
          Binglin Chang made changes -
          Attachment HDFS-4273.v5.patch [ 12615064 ]
          Hide
          Vinayakumar B added a comment -

          +1, Latest patch looks good.

          Show
          Vinayakumar B added a comment - +1, Latest patch looks good.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12615064/HDFS-4273.v5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5516//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5516//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615064/HDFS-4273.v5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5516//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5516//console This message is automatically generated.
          Hide
          Binglin Chang added a comment -

          Hi, @Luke or @Vinay, Looks like the patch is OK, could you help get this committed? Thanks!

          Show
          Binglin Chang added a comment - Hi, @Luke or @Vinay, Looks like the patch is OK, could you help get this committed? Thanks!
          Hide
          Uma Maheswara Rao G added a comment -

          Binglin Chang, Let me take a look, I will commit the patch for you.

          Show
          Uma Maheswara Rao G added a comment - Binglin Chang , Let me take a look, I will commit the patch for you.
          Hide
          Liang Xie added a comment -
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
          

          i checked codebase, it shows :

            private synchronized boolean seekToBlockSource(long targetPos)
                                                           throws IOException {
              currentNode = blockSeekTo(targetPos);
              return true;
            }
          

          It could not return false, seems the original description is stale ?

          Show
          Liang Xie added a comment - -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode -> blockSeekTo() -> chooseDataNode() -> block missing, clear deadNodes and pick the currentNode again seekToNewSource() return false i checked codebase, it shows : private synchronized boolean seekToBlockSource( long targetPos) throws IOException { currentNode = blockSeekTo(targetPos); return true ; } It could not return false, seems the original description is stale ?
          Hide
          Binglin Chang added a comment -

          seekToNewSource, not seekToBlockSource

          Show
          Binglin Chang added a comment - seekToNewSource, not seekToBlockSource
          Hide
          Liang Xie added a comment -

          Oh, my stupid

          Show
          Liang Xie added a comment - Oh, my stupid
          Hide
          Binglin Chang added a comment -

          Update patch because of recent commit conflict, changes:
          Change getStorageID to getDatanodeUuid
          Uma Maheswara Rao G if you are reviewing please use the latest patch.

          Show
          Binglin Chang added a comment - Update patch because of recent commit conflict, changes: Change getStorageID to getDatanodeUuid Uma Maheswara Rao G if you are reviewing please use the latest patch.
          Binglin Chang made changes -
          Attachment HDFS-4273.v6.patch [ 12619271 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12619271/HDFS-4273.v6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5756//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5756//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619271/HDFS-4273.v6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5756//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5756//console This message is automatically generated.
          Hide
          Binglin Chang added a comment -

          The test failure TestBlocksWithNotEnoughRacks is not caused by this patch, and is tracked in HDFS-5540

          Show
          Binglin Chang added a comment - The test failure TestBlocksWithNotEnoughRacks is not caused by this patch, and is tracked in HDFS-5540
          Hide
          LiuLei added a comment -

          Hi Binglin, I have another case.

          I use Hbase-0.94 and CDH-4.3.1
          When RegionServer read data from loca datanode, if local datanode is dead, the local datanode is add to deadNodes, and RegionServer read data from remote datanode. But when local datanode is become live, RegionServer still read data from remote datanode, that reduces the performance of RegionServer. We need to one way that remove local datanode from deadNodes when the local datanode is become live.

          Show
          LiuLei added a comment - Hi Binglin, I have another case. I use Hbase-0.94 and CDH-4.3.1 When RegionServer read data from loca datanode, if local datanode is dead, the local datanode is add to deadNodes, and RegionServer read data from remote datanode. But when local datanode is become live, RegionServer still read data from remote datanode, that reduces the performance of RegionServer. We need to one way that remove local datanode from deadNodes when the local datanode is become live.
          Hide
          Binglin Chang added a comment -

          Hi LiuLei, this is indeed an issue.
          By my understanding, hbase mostly uses pread to read hfile data in multithread, but deadnodes is not per thread, hence a more broader issue is if we change deadnodes in one thread, it affects other threads' logic. This jira doesn't not fix this broader issue.
          I am thinking of a workaround, if a DN in deadNodes is local, we give it an expire time, after it expires, we remove it, this is not perfect but should solve the issue most of the time.

          Show
          Binglin Chang added a comment - Hi LiuLei, this is indeed an issue. By my understanding, hbase mostly uses pread to read hfile data in multithread, but deadnodes is not per thread, hence a more broader issue is if we change deadnodes in one thread, it affects other threads' logic. This jira doesn't not fix this broader issue. I am thinking of a workaround, if a DN in deadNodes is local, we give it an expire time, after it expires, we remove it, this is not perfect but should solve the issue most of the time.
          Hide
          Binglin Chang added a comment -

          Update patch, chages:
          1. rebase to current trunk
          2. local DN in deadNodes can expire, after local DN expires, it is removed from deadNodes
          3. set static const LOCAL_DEADNODE_EXPIRE_MILLISECONDS to10 minutes, so local DN should expire in 10 minutes, then read operations will try to use this local DN is possible. Assuming fail is fast when connecting to local DN when local DN is dead, performance impact should be small for extra retry.

          We can make LOCAL_DEADNODE_EXPIRE_MILLISECONDS configurable by adding it to dfsclient.conf, if someone think it necessary.

          Show
          Binglin Chang added a comment - Update patch, chages: 1. rebase to current trunk 2. local DN in deadNodes can expire, after local DN expires, it is removed from deadNodes 3. set static const LOCAL_DEADNODE_EXPIRE_MILLISECONDS to10 minutes, so local DN should expire in 10 minutes, then read operations will try to use this local DN is possible. Assuming fail is fast when connecting to local DN when local DN is dead, performance impact should be small for extra retry. We can make LOCAL_DEADNODE_EXPIRE_MILLISECONDS configurable by adding it to dfsclient.conf, if someone think it necessary.
          Binglin Chang made changes -
          Attachment HDFS-4273.v7.patch [ 12621571 ]
          Binglin Chang made changes -
          Summary Problem in DFSInputStream read retry logic may cause early failure Fix some issue in DFSInputstream
          Binglin Chang made changes -
          Description Assume the following call logic
          {noformat}
          readWithStrategy()
            -> blockSeekTo()
            -> readBuffer()
               -> reader.doRead()
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
               readBuffer() re-throw the exception quit loop
          readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures.
          {noformat}
          some issues of the logic:
          1. seekToNewSource() logic is broken because it may clear deadNodes in the middle.
          2. the variable "int retries=2" in readWithStrategy seems have conflict with MaxBlockAcquireFailures, should it be removed?
          Follow issues in DFSInputStream is address in this jira:
          1. read may not retry enough in some cases cause early failure
          Assume the following call logic
          {noformat}
          readWithStrategy()
            -> blockSeekTo()
            -> readBuffer()
               -> reader.doRead()
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
               readBuffer() re-throw the exception quit loop
          readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures.
          {noformat}

          2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit.

          3.
          Binglin Chang made changes -
          Description Follow issues in DFSInputStream is address in this jira:
          1. read may not retry enough in some cases cause early failure
          Assume the following call logic
          {noformat}
          readWithStrategy()
            -> blockSeekTo()
            -> readBuffer()
               -> reader.doRead()
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
               readBuffer() re-throw the exception quit loop
          readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures.
          {noformat}

          2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit.

          3.
          Follow issues in DFSInputStream is address in this jira:
          1. read may not retry enough in some cases cause early failure
          Assume the following call logic
          {noformat}
          readWithStrategy()
            -> blockSeekTo()
            -> readBuffer()
               -> reader.doRead()
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
               readBuffer() re-throw the exception quit loop
          readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures.
          {noformat}

          2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit.

          3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode is become live.
          Binglin Chang made changes -
          Description Follow issues in DFSInputStream is address in this jira:
          1. read may not retry enough in some cases cause early failure
          Assume the following call logic
          {noformat}
          readWithStrategy()
            -> blockSeekTo()
            -> readBuffer()
               -> reader.doRead()
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
               readBuffer() re-throw the exception quit loop
          readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures.
          {noformat}

          2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit.

          3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode is become live.
          Following issues in DFSInputStream are addressed in this jira:
          1. read may not retry enough in some cases cause early failure
          Assume the following call logic
          {noformat}
          readWithStrategy()
            -> blockSeekTo()
            -> readBuffer()
               -> reader.doRead()
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
               readBuffer() re-throw the exception quit loop
          readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures.
          {noformat}

          2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it is cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit. Change failures to local variable solve this issue.

          3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode is become live.
          Hide
          LiuLei added a comment -

          Hi, Binglin
          We can add isLocalNodeDead attribute in DFSClient object, when one DFSInputStream object find the local datanode is dead, the DFSInputStream object set isLocalNodeDead attribute to true. We need one thread that detection whether the local datanode is live, if the local datanode is live, the thread set isLocalNodeDead attribute to false. When DFSInputStream object choose datanode, the DFSInputStream need to judge isLocalNodeDead attribute.

          We can create another jira to discuss the question.

          Show
          LiuLei added a comment - Hi, Binglin We can add isLocalNodeDead attribute in DFSClient object, when one DFSInputStream object find the local datanode is dead, the DFSInputStream object set isLocalNodeDead attribute to true. We need one thread that detection whether the local datanode is live, if the local datanode is live, the thread set isLocalNodeDead attribute to false. When DFSInputStream object choose datanode, the DFSInputStream need to judge isLocalNodeDead attribute. We can create another jira to discuss the question.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12621571/HDFS-4273.v7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5827//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5827//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621571/HDFS-4273.v7.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5827//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5827//console This message is automatically generated.
          Hide
          Binglin Chang added a comment -

          We can create another jira to discuss the question.

          About your proposed approach, you need to find local datanode inetaddr and a way to to detect whether the local datanode is live, it may take more effort. Sure we can create another jira.

          Show
          Binglin Chang added a comment - We can create another jira to discuss the question. About your proposed approach, you need to find local datanode inetaddr and a way to to detect whether the local datanode is live, it may take more effort. Sure we can create another jira.
          Hide
          Junping Du added a comment -

          Current simple workaround (expire for local deadnode) looks good to me. LiuLei, please go ahead to create a separated JIRA on discussing other approach if you have better ideas.
          Hi Uma Maheswara Rao G, are you currently reviewing this jira? If not, I can help to review on this effort.

          Show
          Junping Du added a comment - Current simple workaround (expire for local deadnode) looks good to me. LiuLei , please go ahead to create a separated JIRA on discussing other approach if you have better ideas. Hi Uma Maheswara Rao G , are you currently reviewing this jira? If not, I can help to review on this effort.
          Hide
          Binglin Chang added a comment -

          Update patch to remove expiring local deadNodes related changes. Will create another jira addressing it.

          Show
          Binglin Chang added a comment - Update patch to remove expiring local deadNodes related changes. Will create another jira addressing it.
          Binglin Chang made changes -
          Attachment HDFS-4273.v8.patch [ 12621932 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12621932/HDFS-4273.v8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5844//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5844//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621932/HDFS-4273.v8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5844//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5844//console This message is automatically generated.
          Binglin Chang made changes -
          Link This issue is related to HDFS-6022 [ HDFS-6022 ]

            People

            • Assignee:
              Binglin Chang
              Reporter:
              Binglin Chang
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:

                Development