Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7930

commitBlockSynchronization() does not remove locations

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When commitBlockSynchronization() has less newTargets than in the original block it does not remove unconfirmed locations. This results in that the the block stores locations of different lengths or genStamp (corrupt).

      1. HDFS-7930-2.6.0.patch
        4 kB
        Sangjin Lee
      2. HDFS-7930.003.patch
        6 kB
        Yi Liu
      3. HDFS-7930.002.patch
        6 kB
        Yi Liu
      4. HDFS-7930.001.patch
        7 kB
        Yi Liu

        Issue Links

          Activity

          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Sangjin Lee backported this to 2.6.1. I just pushed the commit to 2.6.1 after running compilation.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Sangjin Lee backported this to 2.6.1. I just pushed the commit to 2.6.1 after running compilation.
          Hide
          sjlee0 Sangjin Lee added a comment -

          The backport to 2.6.0 is more or less straightforward but for the following:

          • the test changes were made in TestFileTruncate which does not exist in 2.6, so it does not apply
          • changed the method argument type from BlockInfoContiguous (only in 2.7) to BlockInfo
          • modified the logging statement (the upstream code is for slf4j Logger which is new in 2.7)

          I'm going to attach a suggested patch for 2.6.0.

          Show
          sjlee0 Sangjin Lee added a comment - The backport to 2.6.0 is more or less straightforward but for the following: the test changes were made in TestFileTruncate which does not exist in 2.6, so it does not apply changed the method argument type from BlockInfoContiguous (only in 2.7) to BlockInfo modified the logging statement (the upstream code is for slf4j Logger which is new in 2.7) I'm going to attach a suggested patch for 2.6.0.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2088 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2088/)
          HDFS-7930. commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2088 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2088/ ) HDFS-7930 . commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #138 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/138/)
          HDFS-7930. commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #138 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/138/ ) HDFS-7930 . commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #129 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/129/)
          HDFS-7930. commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #129 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/129/ ) HDFS-7930 . commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2070 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2070/)
          HDFS-7930. commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2070 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2070/ ) HDFS-7930 . commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #872 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/872/)
          HDFS-7930. commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #872 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/872/ ) HDFS-7930 . commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #138 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/138/)
          HDFS-7930. commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #138 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/138/ ) HDFS-7930 . commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #7378 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7378/)
          HDFS-7930. commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #7378 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7378/ ) HDFS-7930 . commitBlockSynchronization() does not remove locations. (yliu) (yliu: rev e37ca221bf4e9ae5d5e667d8ca284df9fdb33199) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          Hide
          hitliuyi Yi Liu added a comment -

          Thanks a lot for the review, Konstantin Shvachko! Also thanks Byron Wong for the comment.
          Committed it to trunk, branch-2 and branch-2.7.

          Show
          hitliuyi Yi Liu added a comment - Thanks a lot for the review, Konstantin Shvachko ! Also thanks Byron Wong for the comment. Committed it to trunk, branch-2 and branch-2.7.
          Hide
          shv Konstantin Shvachko added a comment -

          I liked your changes in the earlier patch for testTruncateWithDataNodesRestart(), but just removing triggerBlockReports() is enough coverage for the condition this jira is fixing.
          +1 for the latest patch.

          Show
          shv Konstantin Shvachko added a comment - I liked your changes in the earlier patch for testTruncateWithDataNodesRestart() , but just removing triggerBlockReports() is enough coverage for the condition this jira is fixing. +1 for the latest patch.
          Hide
          hitliuyi Yi Liu added a comment -

          If I add triggerBlockReports() before waitReplication() then the test passes, as it finally triggers deletion of the replica on the DN.

          Right.
          As described in #1 above, after dn0 restarts and block reports happen, it's replica of last block is marked as corrupt and dn0 is added to CorruptReplicasMap, but as we see the postponing log message, the block replica is not removed and dn0 is not removed from CorruptReplicasMap. So waitReplication() will timeout (We only have 3 DNs).
          If we add triggerBlockReports, then replica of last block is marked as corrupt again, this time, nr.replicasOnStaleNodes() > 0 is false and it will not postpone. The block replica will be removed and also clear it from CorruptReplicasMap.

          Show
          hitliuyi Yi Liu added a comment - If I add triggerBlockReports() before waitReplication() then the test passes, as it finally triggers deletion of the replica on the DN. Right. As described in #1 above, after dn0 restarts and block reports happen, it's replica of last block is marked as corrupt and dn0 is added to CorruptReplicasMap , but as we see the postponing log message, the block replica is not removed and dn0 is not removed from CorruptReplicasMap . So waitReplication() will timeout (We only have 3 DNs). If we add triggerBlockReports , then replica of last block is marked as corrupt again, this time, nr.replicasOnStaleNodes() > 0 is false and it will not postpone. The block replica will be removed and also clear it from CorruptReplicasMap .
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12705521/HDFS-7930.003.patch
          against trunk revision 93d0f4a.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.tracing.TestTracing
          org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9973//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9973//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705521/HDFS-7930.003.patch against trunk revision 93d0f4a. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.tracing.TestTracing org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9973//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9973//console This message is automatically generated.
          Hide
          hitliuyi Yi Liu added a comment -

          I think it's fine to remove the modification to testTruncateWithDataNodesRestart in the patch, the reason is as #2, #3 above. We already have #4, how do you think?

          I update the patch to add a logging message in markBlockReplicasAsCorrupt() reporting which locations are being marked as corrupt as you suggested.

          Show
          hitliuyi Yi Liu added a comment - I think it's fine to remove the modification to testTruncateWithDataNodesRestart in the patch, the reason is as #2, #3 above. We already have #4, how do you think? I update the patch to add a logging message in markBlockReplicasAsCorrupt() reporting which locations are being marked as corrupt as you suggested.
          Hide
          hitliuyi Yi Liu added a comment - - edited

          Konstantin Shvachko, thanks for your nice discussion. I took further look and found more:
          1.

          but I don't know what is the reason for postponing replica deletion. Postponing should probably be avoided in this case, since the commitBlockSync() is as good as block report for the particular block.

          The result is as following:

          • The postponing log in invalidateBlock called by markBlockAsCorrupt is not by result of block recovery, and it's by result of block report from dn0 after it restarts.
          • Because as discussed, we will call setDataNodeDead after dn0 shutdown, it results in NN checks heartbeat and finds dn0 dead then remove dn0. Later we do truncate, block recovery happens, and genStamp of last block is increased. Then dn0 restarts successfully and block report happens, but it's replica of last block has old genStamp, so it will be marked as corrupted, when NN invalidates it's block replica, nr.replicasOnStaleNodes() > 0 is true, which is because DatanodeStorageInfo#blockContentsStale for dn0 in NN is true at this time. blockContentsStale is set to false in DatanodeStorageInfo#receivedBlockReport which is called after we finish processing block report in BlockManager.

          2.

          otherwise NN can schedule recovery (with 1/3 probablity) on the dead DN as a primary. It will wait for the heartbeat from it, but will never receive it because it is down.

          Yes, the checkBlockRecovery() timeout is because of this. Currently when starting block recovery, NN chooses a DN (we say dn0) as primary, dn0 can get the BlockRecoveryCommand through the response of it's heartbeat to NN. but if the dn0 is dead, blockrecovery will not happen. Until expiredHardLimit is true in LeaseManager#checkLeases, then a new DN will be chosen as the primary and schedule a new block recovery for that last block.

          3.

          You need to call setDataNodeDead() after shutdown()

          Our modification to testTruncateWithDataNodesRestart is to assert replicas is decreased by 1 because replica of last block on dn0 should be marked as corrupt in NN after block recovery.
          If we set setDataNodeDead after dn0 shutdown and before truncate, the replicas is already decreased by 1, then new added assertion doesn't make sense.

          4.

          BTW, your change completely eliminates the failure of testTruncateWithDataNodesRestartImmediately() from HDFS-7886, which I ran without triggerBlockReports().

          Right

          Show
          hitliuyi Yi Liu added a comment - - edited Konstantin Shvachko , thanks for your nice discussion. I took further look and found more: 1. but I don't know what is the reason for postponing replica deletion. Postponing should probably be avoided in this case, since the commitBlockSync() is as good as block report for the particular block. The result is as following: The postponing log in invalidateBlock called by markBlockAsCorrupt is not by result of block recovery, and it's by result of block report from dn0 after it restarts. Because as discussed, we will call setDataNodeDead after dn0 shutdown, it results in NN checks heartbeat and finds dn0 dead then remove dn0 . Later we do truncate, block recovery happens, and genStamp of last block is increased. Then dn0 restarts successfully and block report happens, but it's replica of last block has old genStamp, so it will be marked as corrupted, when NN invalidates it's block replica, nr.replicasOnStaleNodes() > 0 is true , which is because DatanodeStorageInfo#blockContentsStale for dn0 in NN is true at this time. blockContentsStale is set to false in DatanodeStorageInfo#receivedBlockReport which is called after we finish processing block report in BlockManager . 2. otherwise NN can schedule recovery (with 1/3 probablity) on the dead DN as a primary. It will wait for the heartbeat from it, but will never receive it because it is down. Yes, the checkBlockRecovery() timeout is because of this. Currently when starting block recovery, NN chooses a DN (we say dn0) as primary, dn0 can get the BlockRecoveryCommand through the response of it's heartbeat to NN. but if the dn0 is dead, blockrecovery will not happen. Until expiredHardLimit is true in LeaseManager#checkLeases , then a new DN will be chosen as the primary and schedule a new block recovery for that last block. 3. You need to call setDataNodeDead() after shutdown() Our modification to testTruncateWithDataNodesRestart is to assert replicas is decreased by 1 because replica of last block on dn0 should be marked as corrupt in NN after block recovery. If we set setDataNodeDead after dn0 shutdown and before truncate , the replicas is already decreased by 1, then new added assertion doesn't make sense. 4. BTW, your change completely eliminates the failure of testTruncateWithDataNodesRestartImmediately() from HDFS-7886 , which I ran without triggerBlockReports(). Right
          Hide
          hitliuyi Yi Liu added a comment -

          Thanks Konstantin Shvachko for comments, I will update the patch and respond to you later. (yesterday I was not available.)

          Show
          hitliuyi Yi Liu added a comment - Thanks Konstantin Shvachko for comments, I will update the patch and respond to you later. (yesterday I was not available.)
          Hide
          shv Konstantin Shvachko added a comment -

          Although this will not fix the testTruncateWithDataNodesRestart() completely. The location is correctly invalidated on the NN, but then NN postpones invalidation on the DN and waits for the next report.

          2015-03-18 15:11:02,922 INFO  BlockStateChange (CorruptReplicasMap.java:addToCorruptReplicasMap(76)) - BLOCK NameSystem.addToCorruptReplicasMap: blk_1073741827 added as corrupt on 127.0.0.1:46044 by localhost/127.0.0.1  because block is COMPLETE and reported genstamp 1003 does not match genstamp in block map 1004
          2015-03-18 15:11:02,922 INFO  BlockStateChange (BlockManager.java:invalidateBlock(1215)) - BLOCK* invalidateBlock: blk_1073741827_1003(stored=blk_1073741827_1004) on 127.0.0.1:46044
          2015-03-18 15:11:02,922 INFO  BlockStateChange (BlockManager.java:invalidateBlock(1225)) - BLOCK* invalidateBlocks: postponing invalidation of blk_1073741827_1003(stored=blk_1073741827_1004) on 127.0.0.1:46044 because 1 replica(s) are located on nodes with potentially out-of-date block reports
          

          If I add triggerBlockReports() before waitReplication() then the test passes, as it finally triggers deletion of the replica on the DN.
          I am fine fixing the test by adding triggerBlockReports() as above, but I don't know what is the reason for postponing replica deletion. Postponing should probably be avoided in this case, since the commitBlockSync() is as good as block report for the particular block.

          BTW, your change completely eliminates the failure of testTruncateWithDataNodesRestartImmediately() from HDFS-7886, which I ran without triggerBlockReports().

          Show
          shv Konstantin Shvachko added a comment - Although this will not fix the testTruncateWithDataNodesRestart() completely. The location is correctly invalidated on the NN, but then NN postpones invalidation on the DN and waits for the next report. 2015-03-18 15:11:02,922 INFO BlockStateChange (CorruptReplicasMap.java:addToCorruptReplicasMap(76)) - BLOCK NameSystem.addToCorruptReplicasMap: blk_1073741827 added as corrupt on 127.0.0.1:46044 by localhost/127.0.0.1 because block is COMPLETE and reported genstamp 1003 does not match genstamp in block map 1004 2015-03-18 15:11:02,922 INFO BlockStateChange (BlockManager.java:invalidateBlock(1215)) - BLOCK* invalidateBlock: blk_1073741827_1003(stored=blk_1073741827_1004) on 127.0.0.1:46044 2015-03-18 15:11:02,922 INFO BlockStateChange (BlockManager.java:invalidateBlock(1225)) - BLOCK* invalidateBlocks: postponing invalidation of blk_1073741827_1003(stored=blk_1073741827_1004) on 127.0.0.1:46044 because 1 replica(s) are located on nodes with potentially out-of-date block reports If I add triggerBlockReports() before waitReplication() then the test passes, as it finally triggers deletion of the replica on the DN. I am fine fixing the test by adding triggerBlockReports() as above, but I don't know what is the reason for postponing replica deletion. Postponing should probably be avoided in this case, since the commitBlockSync() is as good as block report for the particular block. BTW, your change completely eliminates the failure of testTruncateWithDataNodesRestartImmediately() from HDFS-7886 , which I ran without triggerBlockReports() .
          Hide
          shv Konstantin Shvachko added a comment -

          Also it seems you need to use cluster.stopDataNode(0) rather than directly shutting it down, otherwise it wont remove the DN reference from MiniCluster.
          And then for restarting I suggest using restartDataNode(DataNodeProperties).

          Show
          shv Konstantin Shvachko added a comment - Also it seems you need to use cluster.stopDataNode(0) rather than directly shutting it down, otherwise it wont remove the DN reference from MiniCluster. And then for restarting I suggest using restartDataNode(DataNodeProperties) .
          Hide
          shv Konstantin Shvachko added a comment -

          Actually taking the last observation back.
          You need to call setDataNodeDead() after shutdown(), otherwise NN can schedule recovery (with 1/3 probablity) on the dead DN as a primary. It will wait for the heartbeat from it, but will never receive it because it is down.

          Show
          shv Konstantin Shvachko added a comment - Actually taking the last observation back. You need to call setDataNodeDead() after shutdown() , otherwise NN can schedule recovery (with 1/3 probablity) on the dead DN as a primary. It will wait for the heartbeat from it, but will never receive it because it is down.
          Hide
          shv Konstantin Shvachko added a comment -

          Looks good now. One last thing is that we should add a logging message in markBlockReplicasAsCorrupt() reporting which locations are being marked as corrupt. We can do it on the info level. This will help finding problems, if there are any.
          Also, the test failure is related to the new code. The very first checkBlockRecovery() times out. I did not look deep, but it could be because we only have 3 DNs. One replica out of three is corrupt, but there are no more targets to replicate it before the corrupt replica can be removed. The log message would have helped.

          Show
          shv Konstantin Shvachko added a comment - Looks good now. One last thing is that we should add a logging message in markBlockReplicasAsCorrupt() reporting which locations are being marked as corrupt. We can do it on the info level. This will help finding problems, if there are any. Also, the test failure is related to the new code. The very first checkBlockRecovery() times out. I did not look deep, but it could be because we only have 3 DNs. One replica out of three is corrupt, but there are no more targets to replicate it before the corrupt replica can be removed. The log message would have helped.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12705086/HDFS-7930.002.patch
          against trunk revision 7179f94.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.TestFileTruncate

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9921//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9921//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705086/HDFS-7930.002.patch against trunk revision 7179f94. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFileTruncate Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9921//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9921//console This message is automatically generated.
          Hide
          hitliuyi Yi Liu added a comment -

          Thanks Konstantin Shvachko for review, update the patch to address all your comments.

          Show
          hitliuyi Yi Liu added a comment - Thanks Konstantin Shvachko for review, update the patch to address all your comments.
          Hide
          shv Konstantin Shvachko added a comment -

          Looks like the right fix. Few suggestions:

          1. Merge test cases testTruncateWithDataNodesRestart() and testTruncateWithDataNodesRestart1() into one. The latter seems to be a more detailed case of the former.
          2. Also you should use restartDataNode() with expireOnNN == true now that HDFS-7886 is in.
          3. Remove misleading comment in FSN.commitBlockSync()
            // There should be no locations in the blockManager till now ....
          4. Few improvements to markBlockReplicasAsCorrupt()
            • excludedStorages should be newStorages or committedStorages
            • Better keep the parameter declaration on a single line
            • toMark could be isCorrupt
            • Just else return; when the genStamp and the length are the same as the old ones.
            • Redundant check excludedStorages.length > 0
          Show
          shv Konstantin Shvachko added a comment - Looks like the right fix. Few suggestions: Merge test cases testTruncateWithDataNodesRestart() and testTruncateWithDataNodesRestart1() into one. The latter seems to be a more detailed case of the former. Also you should use restartDataNode() with expireOnNN == true now that HDFS-7886 is in. Remove misleading comment in FSN.commitBlockSync() // There should be no locations in the blockManager till now .... Few improvements to markBlockReplicasAsCorrupt() excludedStorages should be newStorages or committedStorages Better keep the parameter declaration on a single line toMark could be isCorrupt Just else return; when the genStamp and the length are the same as the old ones. Redundant check excludedStorages.length > 0
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12704734/HDFS-7930.001.patch
          against trunk revision 3ff1ba2.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
          org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9898//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9898//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704734/HDFS-7930.001.patch against trunk revision 3ff1ba2. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9898//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9898//console This message is automatically generated.
          Hide
          hitliuyi Yi Liu added a comment -

          I add a test in TestFileTruncate since it includes block recovery and doesn't create a new class for testing block recovery.
          Following part can be as the regression test for this fix.

          ...
          checkBlockRecovery(p);
          LocatedBlock newBlock = getLocatedBlocks(p).getLastLocatedBlock();
          asertEquals(newBlock.getLocations().length, REPLICATION - 1);
          ...

          Show
          hitliuyi Yi Liu added a comment - I add a test in TestFileTruncate since it includes block recovery and doesn't create a new class for testing block recovery. Following part can be as the regression test for this fix. ... checkBlockRecovery(p); LocatedBlock newBlock = getLocatedBlocks(p).getLastLocatedBlock(); asertEquals(newBlock.getLocations().length, REPLICATION - 1); ...
          Hide
          hitliuyi Yi Liu added a comment - - edited

          Konstantin Shvachko, you are right.
          After block recovery, if commitBlockSynchronization has less newTargets than in the original block, we should mark replicas of that block corrupt on those DNs (Let's call them temporarily-down-DNs) who have not participated in block recovery (maybe shutdown), since the genStamp of the block on those temporarily-down-DNs should be old. While current implementation doesn't do this. Unless those temporarily-down-DNs are restarted again and NN receives block report from them, NN will mark replicas of that block corrupt.

          This results in that user could get blocks with corrupt locations of different lengths or genStamp as you said, it's an critical issue.

          As I, you and Plamen discussed in HDFS-7886, in those tests, we rely on block report indeed happens after DN restarted, the reason is that NN will mark the replicas on bad DN corrupt and truncated block will be replicated to the bad DN after it restarted. After this fix, triggerBlockReports can be removed in testTruncateWithDataNodesRestartImmediately.

          Show
          hitliuyi Yi Liu added a comment - - edited Konstantin Shvachko , you are right. After block recovery, if commitBlockSynchronization has less newTargets than in the original block, we should mark replicas of that block corrupt on those DNs (Let's call them temporarily-down-DNs) who have not participated in block recovery (maybe shutdown), since the genStamp of the block on those temporarily-down-DNs should be old. While current implementation doesn't do this. Unless those temporarily-down-DNs are restarted again and NN receives block report from them, NN will mark replicas of that block corrupt . This results in that user could get blocks with corrupt locations of different lengths or genStamp as you said, it's an critical issue. As I, you and Plamen discussed in HDFS-7886 , in those tests, we rely on block report indeed happens after DN restarted, the reason is that NN will mark the replicas on bad DN corrupt and truncated block will be replicated to the bad DN after it restarted. After this fix, triggerBlockReports can be removed in testTruncateWithDataNodesRestartImmediately.
          Hide
          Byron Wong Byron Wong added a comment -

          The failure is reproducible (after applying the latest patch from HDFS-7886) by commenting out the cluster.triggerBlockReports(); in testTruncateWithDataNodesRestartImmediately and running all test in TestFileTruncate.

          Show
          Byron Wong Byron Wong added a comment - The failure is reproducible (after applying the latest patch from HDFS-7886 ) by commenting out the cluster.triggerBlockReports(); in testTruncateWithDataNodesRestartImmediately and running all test in TestFileTruncate.
          Hide
          shv Konstantin Shvachko added a comment -

          I saw it with truncate in HDFS-7886, described in this comment. But it can also happen with a regular block recovery, particularly when DNs are restarting during the recovery.
          If recovery is started for a UC-block, which already has locations from live DNs, then recovery may succeed only for some of those locations, because the others have e.g. a different length. But commitBlockSynchronization() will not remove the unconfirmed locations. The locations will be invalidated by the next block report and then replicated correctly, but until then reads may see different data or fail.
          Will post a failing test once HDFS-7886 is in.
          Marked it is a blocker for 2.7.0. Feel free to unmark if it is not.

          Show
          shv Konstantin Shvachko added a comment - I saw it with truncate in HDFS-7886 , described in this comment . But it can also happen with a regular block recovery, particularly when DNs are restarting during the recovery. If recovery is started for a UC-block, which already has locations from live DNs, then recovery may succeed only for some of those locations, because the others have e.g. a different length. But commitBlockSynchronization() will not remove the unconfirmed locations. The locations will be invalidated by the next block report and then replicated correctly, but until then reads may see different data or fail. Will post a failing test once HDFS-7886 is in. Marked it is a blocker for 2.7.0. Feel free to unmark if it is not.

            People

            • Assignee:
              hitliuyi Yi Liu
              Reporter:
              shv Konstantin Shvachko
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development