Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3157

Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0, 0.24.0
    • Fix Version/s: 2.0.2-alpha
    • Component/s: namenode
    • Labels:
      None

      Description

      Cluster setup:

      1NN,Three DN(DN1,DN2,DN3),replication factor-2,"dfs.blockreport.intervalMsec" 300,"dfs.datanode.directoryscan.interval" 1

      step 1: write one file "a.txt" with sync(not closed)
      step 2: Delete the blocks in one of the datanode say DN1(from rbw) to which replication happened.
      step 3: close the file.

      Since the replication factor is 2 the blocks are replicated to the other datanode.

      Then at the NN side the following cmd is issued to DN from which the block is deleted
      -------------------------------------------------------------------------------------

      2012-03-19 13:41:36,905 INFO org.apache.hadoop.hdfs.StateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_2903555284838653156 to add as corrupt on XX.XX.XX.XX by /XX.XX.XX.XX because reported RBW replica with genstamp 1002 does not match COMPLETE block's genstamp in block map 1003
      2012-03-19 13:41:39,588 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block blk_2903555284838653156_1003 from neededReplications as it has enough replicas.
      

      From the datanode side in which the block is deleted the following exception occured

      2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error trying to delete block blk_2903555284838653156_1003. BlockInfo not found in volumeMap.
      2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing datanode Command
      java.io.IOException: Error in deleting blocks.
      	at org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:2061)
      	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:581)
      	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:545)
      	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:690)
      	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:522)
      	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:662)
      	at java.lang.Thread.run(Thread.java:619)
      
      1. h3157_20120618.patch
        12 kB
        Tsz Wo Nicholas Sze
      2. HDFS-3157.patch
        8 kB
        Uma Maheswara Rao G
      3. HDFS-3157.patch
        8 kB
        Ashish Singhi
      4. HDFS-3157.patch
        8 kB
        Ashish Singhi
      5. HDFS-3157-1.patch
        11 kB
        Ashish Singhi
      6. HDFS-3157-1.patch
        11 kB
        Ashish Singhi
      7. HDFS-3157-2.patch
        11 kB
        Ashish Singhi
      8. HDFS-3157-3.patch
        10 kB
        Ashish Singhi
      9. HDFS-3157-3.patch
        10 kB
        Ashish Singhi
      10. HDFS-3157-4.patch
        10 kB
        Ashish Singhi
      11. HDFS-3157-5.patch
        20 kB
        Ashish Singhi

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1127 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/)
          HDFS-3157. Fix a bug in the case that the generation stamps of the stored block in a namenode and the reported block from a datanode do not match. Contributed by Ashish Singhi (Revision 1356086)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356086
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1127 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/ ) HDFS-3157 . Fix a bug in the case that the generation stamps of the stored block in a namenode and the reported block from a datanode do not match. Contributed by Ashish Singhi (Revision 1356086) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356086 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1093 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1093/)
          HDFS-3157. Fix a bug in the case that the generation stamps of the stored block in a namenode and the reported block from a datanode do not match. Contributed by Ashish Singhi (Revision 1356086)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356086
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1093 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1093/ ) HDFS-3157 . Fix a bug in the case that the generation stamps of the stored block in a namenode and the reported block from a datanode do not match. Contributed by Ashish Singhi (Revision 1356086) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356086 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2436 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2436/)
          HDFS-3157. Fix a bug in the case that the generation stamps of the stored block in a namenode and the reported block from a datanode do not match. Contributed by Ashish Singhi (Revision 1356086)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356086
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2436 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2436/ ) HDFS-3157 . Fix a bug in the case that the generation stamps of the stored block in a namenode and the reported block from a datanode do not match. Contributed by Ashish Singhi (Revision 1356086) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356086 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2419 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2419/)
          HDFS-3157. Fix a bug in the case that the generation stamps of the stored block in a namenode and the reported block from a datanode do not match. Contributed by Ashish Singhi (Revision 1356086)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356086
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2419 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2419/ ) HDFS-3157 . Fix a bug in the case that the generation stamps of the stored block in a namenode and the reported block from a datanode do not match. Contributed by Ashish Singhi (Revision 1356086) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356086 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2487 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2487/)
          HDFS-3157. Fix a bug in the case that the generation stamps of the stored block in a namenode and the reported block from a datanode do not match. Contributed by Ashish Singhi (Revision 1356086)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356086
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2487 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2487/ ) HDFS-3157 . Fix a bug in the case that the generation stamps of the stored block in a namenode and the reported block from a datanode do not match. Contributed by Ashish Singhi (Revision 1356086) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356086 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this. Thanks, Ashish. You have done a great job!

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Ashish. You have done a great job!
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks Nicholas for the explanation.

          +1 on the patch.

          Show
          Uma Maheswara Rao G added a comment - Thanks Nicholas for the explanation. +1 on the patch.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Uma,

          Thanks for taking a look.

          The reason of using == instead of equals(..) is that one of the constructors (see below) sets stored and corrupted to the same object. So == is fine.

          +    BlockToMarkCorrupt(BlockInfo stored, String reason) {
          +      this(stored, stored, reason);
          +    }
          
          Show
          Tsz Wo Nicholas Sze added a comment - Hi Uma, Thanks for taking a look. The reason of using == instead of equals(..) is that one of the constructors (see below) sets stored and corrupted to the same object. So == is fine. + BlockToMarkCorrupt(BlockInfo stored, String reason) { + this (stored, stored, reason); + }
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Nicholas,

          Latest Patch looks great. I have one comment:

           (corrupted == stored?
          

          This should be .equals? as we creating new reference of BlockInfo explicitly in some of the ctors right?

          And other question is:
          if (countNodes(b.stored).liveReplicas() >= bc.getReplication()) {
          This point may not be related to this patch, but considering one case I wanted to point it.
          Due to several pipeline failure in cluster, only 2 live replicas present in the cluster and all other nodes has the partial block(corrupt) present in RBW.
          Now NN can not invalidat that blocks as it did not meet the enough replication and may try to replicate them to other nodes first. But unfortunately other nodes already have the block with older genstamp. volumes map may have that blocks already and I remember it will reject the replication. So, we have only 2 live replicas even though we have more DNs. But this situation should be very rare and almost no possibility in bigger clusters. Worth considering the case for small clusters. Brahma reported this in one small cluster of 5 nodes. Anyway I will ask him to file separate one, we can discuss there.

          Also Thanks a lot Ashish for your efforts on this issue

          Thanks
          Uma

          Show
          Uma Maheswara Rao G added a comment - Hi Nicholas, Latest Patch looks great. I have one comment: (corrupted == stored? This should be .equals? as we creating new reference of BlockInfo explicitly in some of the ctors right? And other question is: if (countNodes(b.stored).liveReplicas() >= bc.getReplication()) { This point may not be related to this patch, but considering one case I wanted to point it. Due to several pipeline failure in cluster, only 2 live replicas present in the cluster and all other nodes has the partial block(corrupt) present in RBW. Now NN can not invalidat that blocks as it did not meet the enough replication and may try to replicate them to other nodes first. But unfortunately other nodes already have the block with older genstamp. volumes map may have that blocks already and I remember it will reject the replication. So, we have only 2 live replicas even though we have more DNs. But this situation should be very rare and almost no possibility in bigger clusters. Worth considering the case for small clusters. Brahma reported this in one small cluster of 5 nodes. Anyway I will ask him to file separate one, we can discuss there. Also Thanks a lot Ashish for your efforts on this issue Thanks Uma
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12533840/HDFS-3157-5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2718//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2718//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12533840/HDFS-3157-5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2718//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2718//console This message is automatically generated.
          Hide
          Ashish Singhi added a comment -

          The patch provided by Nicholas is more clear, thus attaching the same patch as Nicholas but with the test case.
          Thanks Nicholas

          Show
          Ashish Singhi added a comment - The patch provided by Nicholas is more clear, thus attaching the same patch as Nicholas but with the test case. Thanks Nicholas
          Hide
          Ashish Singhi added a comment -

          Thanks a lot Nicholas for your time and reviewing the patch.

          1.The new BlockInfo(storedBlock) constructor won't copy triplets. So the blockInfo in BlockToMarkCorrupt has the GS in DN but don't have locations.

          Yes, the iblkInfo in BlockToMarkCorrupt has only the GS in DN but doesn't have the locations.

          2.In markBlockAsCorrupt(..), since the location could be empty and the GS could be different from the one in the blocksMap, we lookup the block again.

          While looking into the blocksMap for a block we check only the blockId(which will be same for both reported block and storedBlock here) and not the GS. So we will still have the locations of storedBlock.

          could you combine it with your test if you think they are good?

          I went through the patch it looked good, will upload a patch tomorrow with the test case.

          Show
          Ashish Singhi added a comment - Thanks a lot Nicholas for your time and reviewing the patch. 1.The new BlockInfo(storedBlock) constructor won't copy triplets. So the blockInfo in BlockToMarkCorrupt has the GS in DN but don't have locations. Yes, the iblkInfo in BlockToMarkCorrupt has only the GS in DN but doesn't have the locations. 2.In markBlockAsCorrupt(..), since the location could be empty and the GS could be different from the one in the blocksMap, we lookup the block again. While looking into the blocksMap for a block we check only the blockId(which will be same for both reported block and storedBlock here) and not the GS. So we will still have the locations of storedBlock. could you combine it with your test if you think they are good? I went through the patch it looked good, will upload a patch tomorrow with the test case.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          See if I understand the patch correctly:

          1. The new BlockInfo(storedBlock) constructor won't copy triplets. So the blockInfo in BlockToMarkCorrupt has the GS in DN but don't have locations.
          2. In markBlockAsCorrupt(..), since the location could be empty and the GS could be different from the one in the blocksMap, we lookup the block again.

          If my understanding is correct, I have the following suggestions:

          • Add storedBlock to BlockToMarkCorrupt so that no additional lookup is required.
          • We have to be very careful about when to use the block with the stored gs and when to use the block with the reported gs. In markBlockAsCorrupt(..), calls to addToCorruptReplicasMap and addToInvalidates should pass the block with DN's gs. Other calls (addBlock, countNodes, updateNeededReplications) should pass the block with stored gs. Similar changes has to be done in invalidateBlock(..).

          It is lengthy to describe all the changes. So I put them in h3157_20120618.patch. Ashish, could you combine it with your test if you think they are good?


          I think there are similar bugs in processMisReplicatedBlock(..) and the related code since they do not handle the case that the generation stamps are different. These are the new code introduced for HA. Let's fix it separately.

          Show
          Tsz Wo Nicholas Sze added a comment - See if I understand the patch correctly: The new BlockInfo(storedBlock) constructor won't copy triplets. So the blockInfo in BlockToMarkCorrupt has the GS in DN but don't have locations. In markBlockAsCorrupt(..), since the location could be empty and the GS could be different from the one in the blocksMap, we lookup the block again. If my understanding is correct, I have the following suggestions: Add storedBlock to BlockToMarkCorrupt so that no additional lookup is required. We have to be very careful about when to use the block with the stored gs and when to use the block with the reported gs. In markBlockAsCorrupt(..), calls to addToCorruptReplicasMap and addToInvalidates should pass the block with DN's gs. Other calls (addBlock, countNodes, updateNeededReplications) should pass the block with stored gs. Similar changes has to be done in invalidateBlock(..). It is lengthy to describe all the changes. So I put them in h3157_20120618.patch. Ashish, could you combine it with your test if you think they are good? I think there are similar bugs in processMisReplicatedBlock(..) and the related code since they do not handle the case that the generation stamps are different. These are the new code introduced for HA. Let's fix it separately.
          Hide
          Ashish Singhi added a comment -

          Added a sleep of 100ms in each while loop in the test case. To avoid from one single thread taking most of the CPU usuage.

          Show
          Ashish Singhi added a comment - Added a sleep of 100ms in each while loop in the test case. To avoid from one single thread taking most of the CPU usuage.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Uma, I will take a look the patch tomorrow.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Uma, I will take a look the patch tomorrow.
          Hide
          Uma Maheswara Rao G added a comment -
          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:
          
          org.apache.hadoop.hdfs.TestFileLengthOnClusterRestart
          
          

          Unrelated to this patch.

          Nicholas, Do you have any more comments on this patch?

          Show
          Uma Maheswara Rao G added a comment - -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestFileLengthOnClusterRestart Unrelated to this patch. Nicholas, Do you have any more comments on this patch?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12531062/HDFS-3157-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestFileLengthOnClusterRestart

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2641//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2641//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531062/HDFS-3157-3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestFileLengthOnClusterRestart +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2641//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2641//console This message is automatically generated.
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks a lot Nicholas.

          Even though I have login for Jenkins, I am not able to see the BuildNow option .

          BTW, do you have any comments on this patch?

          Show
          Uma Maheswara Rao G added a comment - Thanks a lot Nicholas. Even though I have login for Jenkins, I am not able to see the BuildNow option . BTW, do you have any comments on this patch?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Uma, I just have started a build.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Uma, I just have started a build.
          Hide
          Uma Maheswara Rao G added a comment -

          Test failure seems to be due to HDFS-3492.
          Now its reverted.
          Ashish, could you please reattach the patch for clean QA report?

          Show
          Uma Maheswara Rao G added a comment - Test failure seems to be due to HDFS-3492 . Now its reverted. Ashish, could you please reattach the patch for clean QA report?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12531062/HDFS-3157-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestShortCircuitLocalRead

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2615//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2615//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531062/HDFS-3157-3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestShortCircuitLocalRead +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2615//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2615//console This message is automatically generated.
          Hide
          Ashish Singhi added a comment -

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestListFilesInFileContext
          org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics

          Not related to the patch.

          Show
          Ashish Singhi added a comment - -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestListFilesInFileContext org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics Not related to the patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12531062/HDFS-3157-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestListFilesInFileContext
          org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2604//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2604//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531062/HDFS-3157-3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestListFilesInFileContext org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2604//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2604//console This message is automatically generated.
          Hide
          Ashish Singhi added a comment -

          Re-submitting the patch as QA did not ran any test files due to 'cmake' error.

          Show
          Ashish Singhi added a comment - Re-submitting the patch as QA did not ran any test files due to 'cmake' error.
          Hide
          Ashish Singhi added a comment -

          Uploaded the patch addressing Uma's comment.

          Now since we are adding the datanode who is reporting the corrupt block to the storedBlock(block in blocksMap) triplets.
          Hence there is no need of copying the triplets of storedBlock to the reported corrupt block. Thus removed all the changes done in BlockInfo class.

          Can someone please review the patch.

          Show
          Ashish Singhi added a comment - Uploaded the patch addressing Uma's comment. Now since we are adding the datanode who is reporting the corrupt block to the storedBlock(block in blocksMap) triplets. Hence there is no need of copying the triplets of storedBlock to the reported corrupt block. Thus removed all the changes done in BlockInfo class. Can someone please review the patch.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12530773/HDFS-3157-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2580//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2580//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12530773/HDFS-3157-3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2580//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2580//console This message is automatically generated.
          Hide
          Ashish Singhi added a comment -

          Thanks Uma.

          Yes your right. I need to handle this case, will upload a patch addressing this..

          Show
          Ashish Singhi added a comment - Thanks Uma. Yes your right. I need to handle this case, will upload a patch addressing this..
          Hide
          Uma Maheswara Rao G added a comment -

          I think you have to handle one more case:

           // Add replica to the data-node if it is not already there
              node.addBlock(storedBlock);
          
              // Add this replica to corruptReplicas Map
              corruptReplicas.addToCorruptReplicasMap(storedBlock, node, reason);
              if (countNodes(storedBlock).liveReplicas() >= bc.getReplication()) {
                // the block is over-replicated so invalidate the replicas immediately
                invalidateBlock(storedBlock, node);
              } else if (namesystem.isPopulatingReplQueues()) {
                // add the block to neededReplication
                updateNeededReplications(storedBlock, -1, 0);
              }
          

          Here you are adding storedBlock which is ported genstamp (assume genstamp is 1).
          When invalidateBlock, it will try to remove newer genstamp from node because blockMap#removeNode will lookup the block again from blockMap.

           if (!blocksMap.removeNode(block, node)) {
                  if(NameNode.stateChangeLog.isDebugEnabled()) {
                    NameNode.stateChangeLog.debug("BLOCK* removeStoredBlock: "
                        + block + " has already been removed from node " + node);
                  }
                  return;
                }
          

          how about adding the block which is present in blockMap? so, that block can be removed successfully when it calls blocksMap.removeNode

          Show
          Uma Maheswara Rao G added a comment - I think you have to handle one more case: // Add replica to the data-node if it is not already there node.addBlock(storedBlock); // Add this replica to corruptReplicas Map corruptReplicas.addToCorruptReplicasMap(storedBlock, node, reason); if (countNodes(storedBlock).liveReplicas() >= bc.getReplication()) { // the block is over-replicated so invalidate the replicas immediately invalidateBlock(storedBlock, node); } else if (namesystem.isPopulatingReplQueues()) { // add the block to neededReplication updateNeededReplications(storedBlock, -1, 0); } Here you are adding storedBlock which is ported genstamp (assume genstamp is 1). When invalidateBlock, it will try to remove newer genstamp from node because blockMap#removeNode will lookup the block again from blockMap. if (!blocksMap.removeNode(block, node)) { if (NameNode.stateChangeLog.isDebugEnabled()) { NameNode.stateChangeLog.debug( "BLOCK* removeStoredBlock: " + block + " has already been removed from node " + node); } return ; } how about adding the block which is present in blockMap? so, that block can be removed successfully when it calls blocksMap.removeNode
          Hide
          Ashish Singhi added a comment -

          Mistake.

          In System.arraycopy it will create a new reference

          System.arraycopy(...) wil not create any new reference. I wanted to say was if we use System.arraycopy(...) any changes done in this.triplets will not be reflected in from.triplets. As both will be pointing to some other locations in the memory.

          Show
          Ashish Singhi added a comment - Mistake. In System.arraycopy it will create a new reference System.arraycopy(...) wil not create any new reference. I wanted to say was if we use System.arraycopy(...) any changes done in this.triplets will not be reflected in from.triplets. As both will be pointing to some other locations in the memory.
          Hide
          Ashish Singhi added a comment -

          I forgot to mention that, I have used

          +      this.triplets = from.triplets;

          instead of

          +      System.arraycopy(from.triplets, 0, this.triplets, 0, from.triplets.length);

          In System.arraycopy it will create a new reference. So problem is in markBlockAsCorrupt(...) at node.addBlock(storedBlock), we will add the datanode into the triplets of corruptBlock but when we call countNodes(...) here when we look in blockMap for the storedBlock it will return the iterator of only one datanode i.e., the one holding the live replica.
          To avoid this I have used this.triplets = from.triplets, so that both will pointing to the same location and there will not be any problem as described above.

          Show
          Ashish Singhi added a comment - I forgot to mention that, I have used + this .triplets = from.triplets; instead of + System .arraycopy(from.triplets, 0, this .triplets, 0, from.triplets.length); In System.arraycopy it will create a new reference. So problem is in markBlockAsCorrupt(...) at node.addBlock(storedBlock), we will add the datanode into the triplets of corruptBlock but when we call countNodes(...) here when we look in blockMap for the storedBlock it will return the iterator of only one datanode i.e., the one holding the live replica. To avoid this I have used this.triplets = from.triplets, so that both will pointing to the same location and there will not be any problem as described above.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12529856/HDFS-3157-2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2523//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2523//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12529856/HDFS-3157-2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2523//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2523//console This message is automatically generated.
          Hide
          Ashish Singhi added a comment -

          Updated the patch addressed Nicholas comment.

          Show
          Ashish Singhi added a comment - Updated the patch addressed Nicholas comment.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12528379/HDFS-3157-1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2492//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2492//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528379/HDFS-3157-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2492//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2492//console This message is automatically generated.
          Hide
          Ashish Singhi added a comment -

          Thanks Nicholas for reviewing.
          Actually while uploading the patch uploaded a wrong patch.
          It should have been

          +    BlockInfo blkInfo = new BlockInfo(blk, storedBlock
          +        .getBlockCollection().getReplication());
          

          but it was

          +    BlockInfo blkInfo = new BlockInfo(storedBlock, storedBlock
          +        .getBlockCollection().getReplication());
          

          my bad

          @Nicholas - Sorry for the mistake and your time. Now I will not be able address your comment.

          Show
          Ashish Singhi added a comment - Thanks Nicholas for reviewing. Actually while uploading the patch uploaded a wrong patch. It should have been + BlockInfo blkInfo = new BlockInfo(blk, storedBlock + .getBlockCollection().getReplication()); but it was + BlockInfo blkInfo = new BlockInfo(storedBlock, storedBlock + .getBlockCollection().getReplication()); my bad @Nicholas - Sorry for the mistake and your time. Now I will not be able address your comment.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Ashish,

          Thanks for the update. I think it is better to use BlockInfo(BlockInfo from) instead of adding getTriplets() and setTriplets(..). We may change BlockInfo(BlockInfo from) as follows:

          -  protected BlockInfo(BlockInfo from) {
          +  protected BlockInfo(BlockInfo from, boolean copyLocations) {
               this(from, from.bc.getReplication());
               this.bc = from.bc;
          +    if (copyLocations) {
          +      System.arraycopy(from.triplets, 0, this.triplets, 0, from.triplets.length);
          +    }
             }
          
          Show
          Tsz Wo Nicholas Sze added a comment - Hi Ashish, Thanks for the update. I think it is better to use BlockInfo(BlockInfo from) instead of adding getTriplets() and setTriplets(..). We may change BlockInfo(BlockInfo from) as follows: - protected BlockInfo(BlockInfo from) { + protected BlockInfo(BlockInfo from, boolean copyLocations) { this (from, from.bc.getReplication()); this .bc = from.bc; + if (copyLocations) { + System .arraycopy(from.triplets, 0, this .triplets, 0, from.triplets.length); + } }
          Hide
          Ashish Singhi added a comment -

          TestReplicationPolicy is passing locally for me with the patch.

          Running org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy
          Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.038 sec
          
          Results :
          
          Tests run: 10, Failures: 0, Errors: 0, Skipped: 0
          
          Show
          Ashish Singhi added a comment - TestReplicationPolicy is passing locally for me with the patch. Running org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.038 sec Results : Tests run: 10, Failures: 0, Errors: 0, Skipped: 0
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12528043/HDFS-3157-1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2473//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2473//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528043/HDFS-3157-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 javadoc. The javadoc tool appears to have generated 2 warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2473//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2473//console This message is automatically generated.
          Hide
          Ashish Singhi added a comment -

          Patch updated and ready for review.
          Please provide your review comments.

          +      /*
          +       * Look up again to storedBlock as it might be a reported block also.
          +       * @see BlockManager#checkReplicaCorrupt(...)
          +       */
          +      BlockInfo blkInfo = blocksMap.getStoredBlock(storedBlock);
          

          I have added this because in updatedNeededReplication we want namenode to ask datanode to replicate the storedBlock which is there in its blockMap not with the reported block with datanode is reporting as corrupt.

          Now in the test case I am asserting 3 things,
          First - There should be one block in the corruptReplicasMap.
          Second - After marking the block as corrupt, ReplicationMonitor thread should replicate a live replica to one of the datanode.
          Third - After replicating the live replica, the corrupt replica in corruptReplicasMap should get invalidated.

          Show
          Ashish Singhi added a comment - Patch updated and ready for review. Please provide your review comments. + /* + * Look up again to storedBlock as it might be a reported block also. + * @see BlockManager#checkReplicaCorrupt(...) + */ + BlockInfo blkInfo = blocksMap.getStoredBlock(storedBlock); I have added this because in updatedNeededReplication we want namenode to ask datanode to replicate the storedBlock which is there in its blockMap not with the reported block with datanode is reporting as corrupt. Now in the test case I am asserting 3 things, First - There should be one block in the corruptReplicasMap. Second - After marking the block as corrupt, ReplicationMonitor thread should replicate a live replica to one of the datanode. Third - After replicating the live replica, the corrupt replica in corruptReplicasMap should get invalidated.
          Hide
          Ashish Singhi added a comment -

          Currently I am working on the following solution for the patch - Rebuilding the blockInfo just with reported block genstamp and other all states same as storedBlock.
          Again with this solution, the test case may randomly fail. Reason,
          Now though the reported block is added into corruptReplicasMap it is not getting invalidated on the DN who is reporting this corrupt block, because for the corrupt block to get invalidated first we need to meet the live replicas for the block equal to the replication factor set.
          Problem - If chooseTarget() picks the same DN who is reporting this corrupt block then it will fail with ReplicaAlreadyExistsException.
          Now question is why NN is picking the same DN who is reporting this corrupt block not the 3rd DN ?
          Answer - In excludedNodes map only one DN will be present who has the live replica of the block( or who has the block in his Finalized folder).
          The following partial logs depicits the above scenario.

          excludedNodes contains the following datanode/s.
          {127.0.0.1:54681=127.0.0.1:54681}
          2012-05-12 23:57:33,773 INFO  hdfs.StateChange (BlockManager.java:computeReplicationWorkForBlocks(1226)) - BLOCK* ask 127.0.0.1:54681 to replicate blk_3471690017167574595_1003 to datanode(s) 127.0.0.1:54041
          2012-05-12 23:57:33,791 INFO  datanode.DataNode (DataNode.java:transferBlock(1221)) - DatanodeRegistration(127.0.0.1, storageID=DS-1047816814-192.168.44.128-54681-1336847251649, infoPort=62840, ipcPort=26036, storageInfo=lv=-40;cid=testClusterID;nsid=1646783488;c=0) Starting thread to transfer block BP-1770179175-192.168.44.128-1336847247907:blk_3471690017167574595_1003 to 127.0.0.1:54041
          2012-05-12 23:57:33,795 INFO  hdfs.StateChange (BlockManager.java:processReport(1450)) - BLOCK* processReport: from DatanodeRegistration(127.0.0.1, storageID=DS-1047816814-192.168.44.128-54681-1336847251649, infoPort=62840, ipcPort=26036, storageInfo=lv=-40;cid=testClusterID;nsid=1646783488;c=0), blocks: 1, processing time: 0 msecs
          2012-05-12 23:57:33,796 INFO  datanode.DataNode (BPServiceActor.java:blockReport(404)) - BlockReport of 1 blocks took 0 msec to generate and 2 msecs for RPC and NN processing
          2012-05-12 23:57:33,796 INFO  datanode.DataNode (BPServiceActor.java:blockReport(423)) - sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@12eb0b3
          2012-05-12 23:57:33,811 INFO  datanode.DataNode (DataXceiver.java:writeBlock(342)) - Receiving block BP-1770179175-192.168.44.128-1336847247907:blk_3471690017167574595_1003 src: /127.0.0.1:33583 dest: /127.0.0.1:54041
          2012-05-12 23:57:33,812 INFO  datanode.DataNode (DataXceiver.java:writeBlock(495)) - opWriteBlock BP-1770179175-192.168.44.128-1336847247907:blk_3471690017167574595_1003 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1770179175-192.168.44.128-1336847247907:blk_3471690017167574595_1003 already exists in state RBW and thus cannot be created.
          2012-05-12 23:57:33,814 ERROR datanode.DataNode (DataXceiver.java:run(193)) - 127.0.0.1:54041:DataXceiver error processing WRITE_BLOCK operation  src: /127.0.0.1:33583 dest: /127.0.0.1:54041
          org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1770179175-192.168.44.128-1336847247907:blk_3471690017167574595_1003 already exists in state RBW and thus cannot be created.
                  at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:795)
                  at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1)
                  at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:151)
                  at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:365)
                  at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
                  at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
                  at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189)
                  at java.lang.Thread.run(Thread.java:619)
          2012-05-12 23:57:33,815 INFO  datanode.DataNode (DataNode.java:run(1406)) - DataTransfer: Transmitted BP-1770179175-192.168.44.128-1336847247907:blk_3471690017167574595_1003 (numBytes=100) to /127.0.0.1:54041
          2012-05-12 23:57:34,066 INFO  hdfs.StateChange (BlockManager.java:processReport(1450)) - BLOCK* processReport: from DatanodeRegistration(127.0.0.1, storageID=DS-610636930-192.168.44.128-20029-1336847250644, infoPort=52843, ipcPort=46734, storageInfo=lv=-40;cid=testClusterID;nsid=1646783488;c=0), blocks: 0, processing time: 0 msecs
          2012-05-12 23:57:34,067 INFO  datanode.DataNode (BPServiceActor.java:blockReport(404)) - BlockReport of 0 blocks took 0 msec to generate and 3 msecs for RPC and NN processing
          2012-05-12 23:57:34,068 INFO  datanode.DataNode (BPServiceActor.java:blockReport(423)) - sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@a1364a
          2012-05-12 23:57:34,099 INFO  hdfs.StateChange (CorruptReplicasMap.java:addToCorruptReplicasMap(66)) - BLOCK NameSystem.addToCorruptReplicasMap: blk_3471690017167574595 added as corrupt on 127.0.0.1:54041 by /127.0.0.1 because reported RBW replica with genstamp 1002 does not match COMPLETE block's genstamp in block map 1003
          2012-05-12 23:57:34,100 INFO  hdfs.StateChange (BlockManager.java:processReport(1450)) - BLOCK* processReport: from DatanodeRegistration(127.0.0.1, storageID=DS-1452741455-192.168.44.128-54041-1336847250645, infoPort=10314, ipcPort=16230, storageInfo=lv=-40;cid=testClusterID;nsid=1646783488;c=0), blocks: 1, processing time: 2 msecs
          2012-05-12 23:57:34,101 INFO  datanode.DataNode (BPServiceActor.java:blockReport(404)) - BlockReport of 1 blocks took 0 msec to generate and 4 msecs for RPC and NN processing
          2012-05-12 23:57:34,101 INFO  datanode.DataNode (BPServiceActor.java:blockReport(423)) - sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@17194a4
          2012-05-12 23:57:34,775 INFO  hdfs.StateChange (BlockManager.java:computeReplicationWorkForBlocks(1096)) - BLOCK* Removing block blk_3471690017167574595_1003 from neededReplications as it has enough replicas. 
          

          Here you can observe that NN is picking the same DN 127.0.0.1:54041 for replication who is reporting the corrupt block and excludedNodes map has only one DN 127.0.0.1:54681 who is having the live replica(printed on the first line of the logs).

          Is there any way to add the DN who is reporting the corrupt block in the excludedNodes map ?

          Show
          Ashish Singhi added a comment - Currently I am working on the following solution for the patch - Rebuilding the blockInfo just with reported block genstamp and other all states same as storedBlock. Again with this solution, the test case may randomly fail. Reason, Now though the reported block is added into corruptReplicasMap it is not getting invalidated on the DN who is reporting this corrupt block, because for the corrupt block to get invalidated first we need to meet the live replicas for the block equal to the replication factor set. Problem - If chooseTarget() picks the same DN who is reporting this corrupt block then it will fail with ReplicaAlreadyExistsException. Now question is why NN is picking the same DN who is reporting this corrupt block not the 3rd DN ? Answer - In excludedNodes map only one DN will be present who has the live replica of the block( or who has the block in his Finalized folder). The following partial logs depicits the above scenario. excludedNodes contains the following datanode/s. {127.0.0.1:54681=127.0.0.1:54681} 2012-05-12 23:57:33,773 INFO hdfs.StateChange (BlockManager.java:computeReplicationWorkForBlocks(1226)) - BLOCK* ask 127.0.0.1:54681 to replicate blk_3471690017167574595_1003 to datanode(s) 127.0.0.1:54041 2012-05-12 23:57:33,791 INFO datanode.DataNode (DataNode.java:transferBlock(1221)) - DatanodeRegistration(127.0.0.1, storageID=DS-1047816814-192.168.44.128-54681-1336847251649, infoPort=62840, ipcPort=26036, storageInfo=lv=-40;cid=testClusterID;nsid=1646783488;c=0) Starting thread to transfer block BP-1770179175-192.168.44.128-1336847247907:blk_3471690017167574595_1003 to 127.0.0.1:54041 2012-05-12 23:57:33,795 INFO hdfs.StateChange (BlockManager.java:processReport(1450)) - BLOCK* processReport: from DatanodeRegistration(127.0.0.1, storageID=DS-1047816814-192.168.44.128-54681-1336847251649, infoPort=62840, ipcPort=26036, storageInfo=lv=-40;cid=testClusterID;nsid=1646783488;c=0), blocks: 1, processing time: 0 msecs 2012-05-12 23:57:33,796 INFO datanode.DataNode (BPServiceActor.java:blockReport(404)) - BlockReport of 1 blocks took 0 msec to generate and 2 msecs for RPC and NN processing 2012-05-12 23:57:33,796 INFO datanode.DataNode (BPServiceActor.java:blockReport(423)) - sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@12eb0b3 2012-05-12 23:57:33,811 INFO datanode.DataNode (DataXceiver.java:writeBlock(342)) - Receiving block BP-1770179175-192.168.44.128-1336847247907:blk_3471690017167574595_1003 src: /127.0.0.1:33583 dest: /127.0.0.1:54041 2012-05-12 23:57:33,812 INFO datanode.DataNode (DataXceiver.java:writeBlock(495)) - opWriteBlock BP-1770179175-192.168.44.128-1336847247907:blk_3471690017167574595_1003 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1770179175-192.168.44.128-1336847247907:blk_3471690017167574595_1003 already exists in state RBW and thus cannot be created. 2012-05-12 23:57:33,814 ERROR datanode.DataNode (DataXceiver.java:run(193)) - 127.0.0.1:54041:DataXceiver error processing WRITE_BLOCK operation src: /127.0.0.1:33583 dest: /127.0.0.1:54041 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1770179175-192.168.44.128-1336847247907:blk_3471690017167574595_1003 already exists in state RBW and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:795) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:151) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:365) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189) at java.lang. Thread .run( Thread .java:619) 2012-05-12 23:57:33,815 INFO datanode.DataNode (DataNode.java:run(1406)) - DataTransfer: Transmitted BP-1770179175-192.168.44.128-1336847247907:blk_3471690017167574595_1003 (numBytes=100) to /127.0.0.1:54041 2012-05-12 23:57:34,066 INFO hdfs.StateChange (BlockManager.java:processReport(1450)) - BLOCK* processReport: from DatanodeRegistration(127.0.0.1, storageID=DS-610636930-192.168.44.128-20029-1336847250644, infoPort=52843, ipcPort=46734, storageInfo=lv=-40;cid=testClusterID;nsid=1646783488;c=0), blocks: 0, processing time: 0 msecs 2012-05-12 23:57:34,067 INFO datanode.DataNode (BPServiceActor.java:blockReport(404)) - BlockReport of 0 blocks took 0 msec to generate and 3 msecs for RPC and NN processing 2012-05-12 23:57:34,068 INFO datanode.DataNode (BPServiceActor.java:blockReport(423)) - sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@a1364a 2012-05-12 23:57:34,099 INFO hdfs.StateChange (CorruptReplicasMap.java:addToCorruptReplicasMap(66)) - BLOCK NameSystem.addToCorruptReplicasMap: blk_3471690017167574595 added as corrupt on 127.0.0.1:54041 by /127.0.0.1 because reported RBW replica with genstamp 1002 does not match COMPLETE block's genstamp in block map 1003 2012-05-12 23:57:34,100 INFO hdfs.StateChange (BlockManager.java:processReport(1450)) - BLOCK* processReport: from DatanodeRegistration(127.0.0.1, storageID=DS-1452741455-192.168.44.128-54041-1336847250645, infoPort=10314, ipcPort=16230, storageInfo=lv=-40;cid=testClusterID;nsid=1646783488;c=0), blocks: 1, processing time: 2 msecs 2012-05-12 23:57:34,101 INFO datanode.DataNode (BPServiceActor.java:blockReport(404)) - BlockReport of 1 blocks took 0 msec to generate and 4 msecs for RPC and NN processing 2012-05-12 23:57:34,101 INFO datanode.DataNode (BPServiceActor.java:blockReport(423)) - sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@17194a4 2012-05-12 23:57:34,775 INFO hdfs.StateChange (BlockManager.java:computeReplicationWorkForBlocks(1096)) - BLOCK* Removing block blk_3471690017167574595_1003 from neededReplications as it has enough replicas. Here you can observe that NN is picking the same DN 127.0.0.1:54041 for replication who is reporting the corrupt block and excludedNodes map has only one DN 127.0.0.1:54681 who is having the live replica(printed on the first line of the logs). Is there any way to add the DN who is reporting the corrupt block in the excludedNodes map ?
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1040 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1040/)
          Reverting (Need to re-do the patch. new BlockInfo does not set iNode ) HDFS-3157. Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. (Revision 1336572)

          Result = FAILURE
          umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336572
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1040 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1040/ ) Reverting (Need to re-do the patch. new BlockInfo does not set iNode ) HDFS-3157 . Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. (Revision 1336572) Result = FAILURE umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336572 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Uma Maheswara Rao G added a comment -

          One potential issue with this patch:
          Because it creates a new BlockInfo object, that BlockInfo doesn't have any pointer to the associated inode. Hence when we call markBlockAsCorrupt, it doesn't go through the normal corrupt replica handling path – instead, it gets immediately enqueued for deletion.

          You are right, Infact we have reverted this patch because of that cause.

          This makes me a little bit nervous – if we had a bug, for example, which caused the NN's view of the gen stamp to get increased without the DNs being increased, we would issue deletions for all replicas. If instead we were going through the normal corrupt replica handling path, it would first make sure it had good replicas of the "correct" genstamp before invalidating the corrupt replicas. That would prevent the data loss, instead turning into an unavailability.

          Does that make sense?

          Right. It make sense to me. We have go through the normal corruption flow. We will update the patch soon for that.

          Show
          Uma Maheswara Rao G added a comment - One potential issue with this patch: Because it creates a new BlockInfo object, that BlockInfo doesn't have any pointer to the associated inode. Hence when we call markBlockAsCorrupt, it doesn't go through the normal corrupt replica handling path – instead, it gets immediately enqueued for deletion. You are right, Infact we have reverted this patch because of that cause. This makes me a little bit nervous – if we had a bug, for example, which caused the NN's view of the gen stamp to get increased without the DNs being increased, we would issue deletions for all replicas. If instead we were going through the normal corrupt replica handling path, it would first make sure it had good replicas of the "correct" genstamp before invalidating the corrupt replicas. That would prevent the data loss, instead turning into an unavailability. Does that make sense? Right. It make sense to me. We have go through the normal corruption flow. We will update the patch soon for that.
          Hide
          Todd Lipcon added a comment -

          One potential issue with this patch:
          Because it creates a new BlockInfo object, that BlockInfo doesn't have any pointer to the associated inode. Hence when we call markBlockAsCorrupt, it doesn't go through the normal corrupt replica handling path – instead, it gets immediately enqueued for deletion.

          This makes me a little bit nervous – if we had a bug, for example, which caused the NN's view of the gen stamp to get increased without the DNs being increased, we would issue deletions for all replicas. If instead we were going through the normal corrupt replica handling path, it would first make sure it had good replicas of the "correct" genstamp before invalidating the corrupt replicas. That would prevent the data loss, instead turning into an unavailability.

          Does that make sense?

          Show
          Todd Lipcon added a comment - One potential issue with this patch: Because it creates a new BlockInfo object, that BlockInfo doesn't have any pointer to the associated inode. Hence when we call markBlockAsCorrupt, it doesn't go through the normal corrupt replica handling path – instead, it gets immediately enqueued for deletion. This makes me a little bit nervous – if we had a bug, for example, which caused the NN's view of the gen stamp to get increased without the DNs being increased, we would issue deletions for all replicas. If instead we were going through the normal corrupt replica handling path, it would first make sure it had good replicas of the "correct" genstamp before invalidating the corrupt replicas. That would prevent the data loss, instead turning into an unavailability. Does that make sense?
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1075 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1075/)
          Reverting (Need to re-do the patch. new BlockInfo does not set iNode ) HDFS-3157. Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. (Revision 1336572)

          Result = SUCCESS
          umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336572
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1075 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1075/ ) Reverting (Need to re-do the patch. new BlockInfo does not set iNode ) HDFS-3157 . Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. (Revision 1336572) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336572 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2237 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2237/)
          Reverting (Need to re-do the patch. new BlockInfo does not set iNode ) HDFS-3157. Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. (Revision 1336572)

          Result = ABORTED
          umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336572
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2237 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2237/ ) Reverting (Need to re-do the patch. new BlockInfo does not set iNode ) HDFS-3157 . Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. (Revision 1336572) Result = ABORTED umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336572 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2220 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2220/)
          Reverting (Need to re-do the patch. new BlockInfo does not set iNode ) HDFS-3157. Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. (Revision 1336572)

          Result = SUCCESS
          umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336572
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2220 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2220/ ) Reverting (Need to re-do the patch. new BlockInfo does not set iNode ) HDFS-3157 . Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. (Revision 1336572) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336572 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2295 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2295/)
          Reverting (Need to re-do the patch. new BlockInfo does not set iNode ) HDFS-3157. Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. (Revision 1336572)

          Result = SUCCESS
          umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336572
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2295 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2295/ ) Reverting (Need to re-do the patch. new BlockInfo does not set iNode ) HDFS-3157 . Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. (Revision 1336572) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336572 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Uma Maheswara Rao G added a comment -

          Yes, Nicholas, Thanks a lot for checking this. It will actually will not mark as block corrupt due to that inode check. We may have to rebuild the blockInfo just with reported block genstamp and other state should be same as storedBlock. Let's fix this in next patch. I just reverted the changes.
          Ashish is working on it.

          Show
          Uma Maheswara Rao G added a comment - Yes, Nicholas, Thanks a lot for checking this. It will actually will not mark as block corrupt due to that inode check. We may have to rebuild the blockInfo just with reported block genstamp and other state should be same as storedBlock. Let's fix this in next patch. I just reverted the changes. Ashish is working on it.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The patch actually does not work since the INode in the new BlockInfo is not set. Let's revert it and re-do the patch.

          2012-05-09 16:05:29,960 INFO  hdfs.StateChange (BlockManager.java:markBlockAsCorrupt(926)) - BLOCK markBlockAsCorrupt:
            block blk_-6891652617965059210_1002 could not be marked as corrupt as it does not belong to any file
          
          Show
          Tsz Wo Nicholas Sze added a comment - The patch actually does not work since the INode in the new BlockInfo is not set. Let's revert it and re-do the patch. 2012-05-09 16:05:29,960 INFO hdfs.StateChange (BlockManager.java:markBlockAsCorrupt(926)) - BLOCK markBlockAsCorrupt: block blk_-6891652617965059210_1002 could not be marked as corrupt as it does not belong to any file
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I think we only have to fix the test. The behavior of dn0 is expected. DirectoryScanner.reconcile() should be able to remove the block from the replica map later on.

          Show
          Tsz Wo Nicholas Sze added a comment - I think we only have to fix the test. The behavior of dn0 is expected. DirectoryScanner.reconcile() should be able to remove the block from the replica map later on.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Here is the reason of the TestRBWBlockInvalidation failing:
          The block and meta files are deleted in dn0 but it is still in the replica map (FsDatasetImpl.volumeMap). When replication happens, it fails since the block is still in the replica map and so it throw ReplicaAlreadyExistsException. Therefore, the number of live replicas remains 2.

          > ... I am wondering, we got +1 here from QA.

          I also don't understand that why Jenkins has +1'ed on it. It seems that the test must always fail.

          Show
          Tsz Wo Nicholas Sze added a comment - Here is the reason of the TestRBWBlockInvalidation failing: The block and meta files are deleted in dn0 but it is still in the replica map (FsDatasetImpl.volumeMap). When replication happens, it fails since the block is still in the replica map and so it throw ReplicaAlreadyExistsException. Therefore, the number of live replicas remains 2. > ... I am wondering, we got +1 here from QA. I also don't understand that why Jenkins has +1'ed on it. It seems that the test must always fail.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          When replication happens, somehow the replica already exists.

          //TestRBWBlockInvalidation output
          
          2012-05-09 12:30:04,122 INFO  datanode.DataNode (DataXceiver.java:writeBlock(495))
           - opWriteBlock BP-2087796974-10.10.11.90-1336591801017:blk_-571802999240948417_1003 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException:
             Block BP-2087796974-10.10.11.90-1336591801017:blk_-571802999240948417_1003 already exists in state RBW and thus cannot be created.
          
          Show
          Tsz Wo Nicholas Sze added a comment - When replication happens, somehow the replica already exists. //TestRBWBlockInvalidation output 2012-05-09 12:30:04,122 INFO datanode.DataNode (DataXceiver.java:writeBlock(495)) - opWriteBlock BP-2087796974-10.10.11.90-1336591801017:blk_-571802999240948417_1003 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-2087796974-10.10.11.90-1336591801017:blk_-571802999240948417_1003 already exists in state RBW and thus cannot be created.
          Hide
          Uma Maheswara Rao G added a comment -

          Hi John, Thanks for digging into HDFS-3391. I am wondering, we got +1 here from QA.
          Anyway, I have discussed with Ashish, for taking a look. Let him find the actual cause for it.

          Show
          Uma Maheswara Rao G added a comment - Hi John, Thanks for digging into HDFS-3391 . I am wondering, we got +1 here from QA. Anyway, I have discussed with Ashish, for taking a look. Let him find the actual cause for it.
          Hide
          John George added a comment -

          I believe this JIRA broke TestPipelinesFailover as stated in HDFS-3391. Could you take a look?

          Show
          John George added a comment - I believe this JIRA broke TestPipelinesFailover as stated in HDFS-3391 . Could you take a look?
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1074 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1074/)
          HDFS-3157. Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. Contributed by Ashish Singhi. (Revision 1335719)

          Result = SUCCESS
          umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335719
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1074 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1074/ ) HDFS-3157 . Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. Contributed by Ashish Singhi. (Revision 1335719) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335719 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1039 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1039/)
          HDFS-3157. Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. Contributed by Ashish Singhi. (Revision 1335719)

          Result = FAILURE
          umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335719
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1039 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1039/ ) HDFS-3157 . Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. Contributed by Ashish Singhi. (Revision 1335719) Result = FAILURE umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335719 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2226 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2226/)
          HDFS-3157. Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. Contributed by Ashish Singhi. (Revision 1335719)

          Result = SUCCESS
          umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335719
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2226 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2226/ ) HDFS-3157 . Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. Contributed by Ashish Singhi. (Revision 1335719) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335719 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12526038/HDFS-3157.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2391//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12526038/HDFS-3157.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2391//console This message is automatically generated.
          Hide
          Uma Maheswara Rao G added a comment -

          I have committed this to trunk and branch-2. Thanks a lot Ashish for the contribution!
          Thanks Nicholas, for the review!

          Show
          Uma Maheswara Rao G added a comment - I have committed this to trunk and branch-2. Thanks a lot Ashish for the contribution! Thanks Nicholas, for the review!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2209 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2209/)
          HDFS-3157. Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. Contributed by Ashish Singhi. (Revision 1335719)

          Result = SUCCESS
          umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335719
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2209 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2209/ ) HDFS-3157 . Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. Contributed by Ashish Singhi. (Revision 1335719) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335719 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Uma Maheswara Rao G added a comment -

          Resolved conflicts with trunk and remove one unnecessary empty line in licence header

          • Licensed to the Apache Software Foundation (ASF) under one
          • or more contributor license agreements. See the NOTICE file
          Show
          Uma Maheswara Rao G added a comment - Resolved conflicts with trunk and remove one unnecessary empty line in licence header Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2284 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2284/)
          HDFS-3157. Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. Contributed by Ashish Singhi. (Revision 1335719)

          Result = SUCCESS
          umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335719
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2284 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2284/ ) HDFS-3157 . Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. Contributed by Ashish Singhi. (Revision 1335719) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335719 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
          Hide
          Ashish Singhi added a comment -

          Thanks a lot Uma and Nicholas for reviewing the patch.

          Show
          Ashish Singhi added a comment - Thanks a lot Uma and Nicholas for reviewing the patch.
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks Nicholas, for the clarification.
          I will commit the patch today in some time.

          -1 javadoc. The javadoc tool appears to have generated 16 warning messages.

          javadoc comments are unrelated to this patch.

          Thanks a lot, Ashish for the patch.

          Show
          Uma Maheswara Rao G added a comment - Thanks Nicholas, for the clarification. I will commit the patch today in some time. -1 javadoc. The javadoc tool appears to have generated 16 warning messages. javadoc comments are unrelated to this patch. Thanks a lot, Ashish for the patch.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I think it is a bug and the fix is correct. Good work!

          +1 on the patch.

          Show
          Tsz Wo Nicholas Sze added a comment - I think it is a bug and the fix is correct. Good work! +1 on the patch.
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Ahish, Patch make sense to me. Let me get some clarifications on old behaviour.

          @Nicholas, do you have idea, why we are using storedBlock for marking it as corrupt when genstamps are mismatching. Ideally DN may not be able to find that stored block if genstamp is different from his volumeMaps block. Is there any specific reason for it?

          Show
          Uma Maheswara Rao G added a comment - Hi Ahish, Patch make sense to me. Let me get some clarifications on old behaviour. @Nicholas, do you have idea, why we are using storedBlock for marking it as corrupt when genstamps are mismatching. Ideally DN may not be able to find that stored block if genstamp is different from his volumeMaps block. Is there any specific reason for it?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12525165/HDFS-3157.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          -1 javadoc. The javadoc tool appears to have generated 16 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2352//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2352//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525165/HDFS-3157.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. -1 javadoc. The javadoc tool appears to have generated 16 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2352//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2352//console This message is automatically generated.
          Hide
          Ashish Singhi added a comment -

          Formatted properly.

          Show
          Ashish Singhi added a comment - Formatted properly.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12523172/HDFS-3157.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2295//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2295//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523172/HDFS-3157.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2295//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2295//console This message is automatically generated.
          Hide
          Ashish Singhi added a comment -

          Patch submitted.
          Please review and provide any comments or suggestions.

          Show
          Ashish Singhi added a comment - Patch submitted. Please review and provide any comments or suggestions.
          Hide
          Ashish Singhi added a comment -

          After deleting a block. The pipeline will update the gen stamp of the block say blk_blockId_1002 to blk_blockId_1003.
          Then DN1 will mark the block with old gen stamp as corrupt.
          In BlockManager#processReportedBlock() storedBlock will get assigned to blk_blockId_1003 as blockMap is now updated with new gen stamp for this blockId and then it will ask DN1 to delete this blk_blockId_1003.
          As DN1's volumeMap does not contain blk_blockId_1003. It will throw an exception.

          Show
          Ashish Singhi added a comment - After deleting a block. The pipeline will update the gen stamp of the block say blk_blockId_1002 to blk_blockId_1003. Then DN1 will mark the block with old gen stamp as corrupt. In BlockManager#processReportedBlock() storedBlock will get assigned to blk_blockId_1003 as blockMap is now updated with new gen stamp for this blockId and then it will ask DN1 to delete this blk_blockId_1003. As DN1's volumeMap does not contain blk_blockId_1003. It will throw an exception.
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Andreina,

          It would be good if we keep the descrion field short and add as comments about further details. This can avoid generating the big emails for every update on this issue.

          Thanks
          Uma

          Show
          Uma Maheswara Rao G added a comment - Hi Andreina, It would be good if we keep the descrion field short and add as comments about further details. This can avoid generating the big emails for every update on this issue. Thanks Uma

            People

            • Assignee:
              Ashish Singhi
              Reporter:
              J.Andreina
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development