Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2791

If block report races with closing of file, replica is incorrectly marked corrupt

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0, 0.23.0
    • Fix Version/s: 0.23.1
    • Component/s: datanode, namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The following sequence of events results in a replica mistakenly marked corrupt:
      1. Pipeline is open with 2 replicas
      2. DN1 generates a block report but is slow in sending to the NN (eg some flaky network). It gets "stuck" right before the block report RPC.
      3. Client closes the file.
      4. DN2 is fast and sends blockReceived to the NN. NN marks the block as COMPLETE
      5. DN1's block report proceeds, and includes the block in an RBW state.
      6. NN incorrectly marks the replica as corrupt, since it is an RBW replica on a COMPLETE block.

      1. hdfs-2791-test.txt
        8 kB
        Todd Lipcon
      2. hdfs-2791.txt
        11 kB
        Todd Lipcon
      3. hdfs-2791.txt
        11 kB
        Todd Lipcon
      4. hdfs-2791.txt
        11 kB
        Todd Lipcon
      5. hdfs-2791.txt
        10 kB
        Eli Collins

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          To start brainstorming a solution, here are a few scattered thoughts:

          • The basic requirement is that we should always accept a "past" valid state of the block, since we don't have any global barrier before the block is marked complete. So we need to support RBW-before-finalized even if it has a too-short length, for example. There might be other bugs similar to this if the file is re-opened for append (eg a block is reported with a too-young generation stamp racing with the re-open).
          • I think this can only happen in the first block report after a block's state changes, since each block report freshly examines the DN state. So maybe we can use the block report timestamps to our advantage somehow? ie if the block changed state in between the previous block report was received and this one, the BR might have raced?

          The floor is open for creative simple solutions

          Show
          Todd Lipcon added a comment - To start brainstorming a solution, here are a few scattered thoughts: The basic requirement is that we should always accept a "past" valid state of the block, since we don't have any global barrier before the block is marked complete. So we need to support RBW-before-finalized even if it has a too-short length, for example. There might be other bugs similar to this if the file is re-opened for append (eg a block is reported with a too-young generation stamp racing with the re-open). I think this can only happen in the first block report after a block's state changes, since each block report freshly examines the DN state. So maybe we can use the block report timestamps to our advantage somehow? ie if the block changed state in between the previous block report was received and this one, the BR might have raced? The floor is open for creative simple solutions
          Hide
          Todd Lipcon added a comment -

          Test illustrating the bug

          Show
          Todd Lipcon added a comment - Test illustrating the bug
          Hide
          Todd Lipcon added a comment -

          I've been thinking about this over the weekend and this morning. My current thinking is that the safest bet is the following approach:

          When an RBW block is reported for a finalized replica:

          • Case 1) if the block has a too-low generation stamp, mark it corrupt.
          • Case 2) if the block has the correct generation stamp, ignore it (don't add to block locations or mark it corrupt)

          Here's the reasoning:

          Case 1 One of the DNs is reporting a stale generation stamp.

          This means that the client must have either appended to the block or undergone pipeline recovery. There are two possibilities of why the DN is thus reporting an old genstamp:

          • 1a) it is a "delayed block report" as described in this JIRA. We will later see a correct/up-to-date BR for the same block.

          Here it is OK to mark the block as corrupt, since when we sent the "invalidate" message to the DN, we'll invalidate the old genstamp specifically. So when the DN receives the invalidation, it will not delete the new (correct) replica, but rather just ignore it.

          • 1b) the client lost its connection to this DN node and did a pipeline recovery before closing the file. In this case we will never see a correct/up-to-date BR.

          Here it's also OK to mark it as corrupt, because it really is corrupt (ie didn't participate in the block recovery).

          Case 2 correct generation stamp, but RBW report on a FINALIZED block

          As far as I can think, the only way we can get here is with the "delayed report" scenario described in this JIRA. The reasoning is as follows:

          • in order for the client to call completeBlock(), it must have gotten a successful pipeline close from all of the DNs in the current pipeline
          • if the pipeline nodes had changed, it would have gotten a different generation stamp. So, all of the nodes that have a block with the correct genstamp were in the closed pipeline
          • thus all of the nodes with the correct genstamp would have the correct length and state, and any report otherwise is because of a message delay.

          The only other possibility is something like a machine crash which doesn't replay the ext3 journal causing some blocks to get rolled back to a prior state. In this case, upon restart, the DN would change it to be a RWR (ReplicaWaitingRecovery) and we could use the original logic of marking it corrupt.

          I think the above solution is safer and simpler than any other solutions I could come up with.

          Show
          Todd Lipcon added a comment - I've been thinking about this over the weekend and this morning. My current thinking is that the safest bet is the following approach: When an RBW block is reported for a finalized replica: Case 1) if the block has a too-low generation stamp, mark it corrupt. Case 2) if the block has the correct generation stamp, ignore it (don't add to block locations or mark it corrupt) Here's the reasoning: Case 1 One of the DNs is reporting a stale generation stamp. This means that the client must have either appended to the block or undergone pipeline recovery. There are two possibilities of why the DN is thus reporting an old genstamp: 1a) it is a "delayed block report" as described in this JIRA. We will later see a correct/up-to-date BR for the same block. Here it is OK to mark the block as corrupt, since when we sent the "invalidate" message to the DN, we'll invalidate the old genstamp specifically. So when the DN receives the invalidation, it will not delete the new (correct) replica, but rather just ignore it. 1b) the client lost its connection to this DN node and did a pipeline recovery before closing the file. In this case we will never see a correct/up-to-date BR. Here it's also OK to mark it as corrupt, because it really is corrupt (ie didn't participate in the block recovery). Case 2 correct generation stamp, but RBW report on a FINALIZED block As far as I can think, the only way we can get here is with the "delayed report" scenario described in this JIRA. The reasoning is as follows: in order for the client to call completeBlock(), it must have gotten a successful pipeline close from all of the DNs in the current pipeline if the pipeline nodes had changed, it would have gotten a different generation stamp. So, all of the nodes that have a block with the correct genstamp were in the closed pipeline thus all of the nodes with the correct genstamp would have the correct length and state, and any report otherwise is because of a message delay. The only other possibility is something like a machine crash which doesn't replay the ext3 journal causing some blocks to get rolled back to a prior state. In this case, upon restart, the DN would change it to be a RWR (ReplicaWaitingRecovery) and we could use the original logic of marking it corrupt. I think the above solution is safer and simpler than any other solutions I could come up with.
          Hide
          Todd Lipcon added a comment -

          Attached implementation of the above along with a fleshed out unit test.

          Show
          Todd Lipcon added a comment - Attached implementation of the above along with a fleshed out unit test.
          Hide
          Todd Lipcon added a comment -

          Seems to be a bug in this that's making some tests timeout. Please review the general approach, though.

          Show
          Todd Lipcon added a comment - Seems to be a bug in this that's making some tests timeout. Please review the general approach, though.
          Hide
          Todd Lipcon added a comment -

          I had a simple error – it should always return false when the block itself is incomplete. Tests seem to be passing now

          Show
          Todd Lipcon added a comment - I had a simple error – it should always return false when the block itself is incomplete. Tests seem to be passing now
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12511552/hdfs-2791.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 21 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1797//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1797//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1797//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511552/hdfs-2791.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 21 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1797//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1797//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1797//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          The latest jenkins result is from the prior patch rev. Waiting on jenkins result from the new patch rev. (the old patch, while it passes trunk tests, caused a problem with one of the HA tests)

          Show
          Todd Lipcon added a comment - The latest jenkins result is from the prior patch rev. Waiting on jenkins result from the new patch rev. (the old patch, while it passes trunk tests, caused a problem with one of the HA tests)
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12511560/hdfs-2791.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 21 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1798//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1798//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1798//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511560/hdfs-2791.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 21 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1798//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1798//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1798//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Looks pretty good to me. Great job thinking through this. Just a few nits.

          1. Looks like some unnecessary imports snuck into BPOfferService.
          2. s/RBW/Rbw/ in testOneReplicaRBWReportArrivesAfterBlockCompleted
          3. "Wait for it the RPC to be sent" -> "Wait for the RPC to be sent"
          Show
          Aaron T. Myers added a comment - Looks pretty good to me. Great job thinking through this. Just a few nits. Looks like some unnecessary imports snuck into BPOfferService. s/RBW/Rbw/ in testOneReplicaRBWReportArrivesAfterBlockCompleted "Wait for it the RPC to be sent" -> "Wait for the RPC to be sent"
          Hide
          Aaron T. Myers added a comment -

          Oh, and to be clear, +1 once the above nits are addressed.

          Show
          Aaron T. Myers added a comment - Oh, and to be clear, +1 once the above nits are addressed.
          Hide
          Todd Lipcon added a comment -

          Attached patch addresses Aaron's comments above. I will commit this tomorrow if there are no objections.

          Show
          Todd Lipcon added a comment - Attached patch addresses Aaron's comments above. I will commit this tomorrow if there are no objections.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12511614/hdfs-2791.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 21 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1802//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1802//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1802//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511614/hdfs-2791.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 21 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1802//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1802//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1802//console This message is automatically generated.
          Hide
          Sanjay Radia added a comment -

          Todd,
          in case 1 and 2 you say "mark it as corrupt". Do you mean report it as corrupt or do you mean delete the replica since it is stale?

          Show
          Sanjay Radia added a comment - Todd, in case 1 and 2 you say "mark it as corrupt". Do you mean report it as corrupt or do you mean delete the replica since it is stale?
          Hide
          Todd Lipcon added a comment -

          The code path calls markBlockAsCorrupt which adds it to corruptReplicasMap. This will cause another replica to get made, and then the original will be deleted – wasted work but not the end of the world.

          In the non-HA case, this is probably OK. But in the HA case, this race can happen even when a file's replication count is 1, in which case it will get marked corrupt and then no new copy can be made. The block will be corrupt until the next full block report.

          Show
          Todd Lipcon added a comment - The code path calls markBlockAsCorrupt which adds it to corruptReplicasMap . This will cause another replica to get made, and then the original will be deleted – wasted work but not the end of the world. In the non-HA case, this is probably OK. But in the HA case, this race can happen even when a file's replication count is 1, in which case it will get marked corrupt and then no new copy can be made. The block will be corrupt until the next full block report.
          Hide
          dhruba borthakur added a comment -

          > Case 1) if the block has a too-low generation stamp, mark it corrupt.

          This sounds fine to me.

          > Case 2) if the block has the correct generation stamp, ignore it (don't add to block locations or mark it corrupt)

          This looks ok at first sight. If the generation stamp and length matches with what the NN has is it safe to add it to the blocksmap?

          Show
          dhruba borthakur added a comment - > Case 1) if the block has a too-low generation stamp, mark it corrupt. This sounds fine to me. > Case 2) if the block has the correct generation stamp, ignore it (don't add to block locations or mark it corrupt) This looks ok at first sight. If the generation stamp and length matches with what the NN has is it safe to add it to the blocksmap?
          Hide
          Sanjay Radia added a comment -

          Can you explain why the situation is different for HA?

          Show
          Sanjay Radia added a comment - Can you explain why the situation is different for HA?
          Hide
          Todd Lipcon added a comment -

          Can you explain why the situation is different for HA?

          Yes. Imagine the following sequence, with two NNs, NNA (active) and NNB(standby):

          1) file with replication count 1 is being written
          2) DN generates a block report with RBW and starts sending it to NNB. The network has a problem and this block report doesn't arrive for a while (eg DN is temporarily partitioned from NNB, so the DN keeps retrying for minutes)
          3) file is closed
          4) NNB tails edits, sees OP_ADD and OP_CLOSE, marks the block as COMPLETE, even though it hasn't seen any replicas yet.
          5) Network partition is resolved, and NNB receives the block report with an RBW replica, mistakenly marking it as corrupt.

          Show
          Todd Lipcon added a comment - Can you explain why the situation is different for HA? Yes. Imagine the following sequence, with two NNs, NNA (active) and NNB(standby): 1) file with replication count 1 is being written 2) DN generates a block report with RBW and starts sending it to NNB. The network has a problem and this block report doesn't arrive for a while (eg DN is temporarily partitioned from NNB, so the DN keeps retrying for minutes) 3) file is closed 4) NNB tails edits, sees OP_ADD and OP_CLOSE, marks the block as COMPLETE, even though it hasn't seen any replicas yet. 5) Network partition is resolved, and NNB receives the block report with an RBW replica, mistakenly marking it as corrupt.
          Hide
          Todd Lipcon added a comment -

          >> Case 2) if the block has the correct generation stamp, ignore it (don't add to block locations or mark it corrupt)
          > This looks ok at first sight. If the generation stamp and length matches with what the NN has is it safe to add it to the blocksmap?

          Yes, I think so. I had originally thought about doing that, but ATM convinced me otherwise - our thinking is now that this case only occurs because of a temporary delay in transmission of the "correct" state. So we'll always expect to see the FINALIZED report come in soon after one of these RBW replicas. Given that, I think it is simpler/safer to ignore it in all cases.

          Show
          Todd Lipcon added a comment - >> Case 2) if the block has the correct generation stamp, ignore it (don't add to block locations or mark it corrupt) > This looks ok at first sight. If the generation stamp and length matches with what the NN has is it safe to add it to the blocksmap? Yes, I think so. I had originally thought about doing that, but ATM convinced me otherwise - our thinking is now that this case only occurs because of a temporary delay in transmission of the "correct" state. So we'll always expect to see the FINALIZED report come in soon after one of these RBW replicas. Given that, I think it is simpler/safer to ignore it in all cases.
          Hide
          dhruba borthakur added a comment -

          +1

          Show
          dhruba borthakur added a comment - +1
          Hide
          Sanjay Radia added a comment -

          >Case 2 ....
          I agree that it it is safer to ignore it if the gen-stamp AND length match rather than mark the replica as corrupt.

          >HA case .. block will be marked as corrupt till next full block report.
          Isn't this addressed by one of 2 events:

          • the AddBlock from the DN arrive at the NNB shortly after the Block report containing RBW. (but it may get lost)
          • The Op_Add contains the block location ( I don't think it contains the block location).
          Show
          Sanjay Radia added a comment - >Case 2 .... I agree that it it is safer to ignore it if the gen-stamp AND length match rather than mark the replica as corrupt. >HA case .. block will be marked as corrupt till next full block report. Isn't this addressed by one of 2 events: the AddBlock from the DN arrive at the NNB shortly after the Block report containing RBW. (but it may get lost) The Op_Add contains the block location ( I don't think it contains the block location).
          Hide
          Todd Lipcon added a comment -

          I agree that it it is safer to ignore it if the gen-stamp AND length match rather than mark the replica as corrupt.

          It's quite likely that the length will be shorter (eg because the client kept writing after the block report was generated)

          The Op_Add contains the block location ( I don't think it contains the block location).

          Right, it doesn't include the block location.

          the AddBlock from the DN arrive at the NNB shortly after the Block report containing RBW.

          Yep, but I don't think that causes it to lose its "corrupt" status. I suppose we could add a special flag to the corrupt block entry saying "corrupt due to state" rather than "corrupt due to checksum errors", and if we receive a correct-state reported block for a "corrupt due to state" block, we unflag the corruption. Does that solution seem preferable to you?

          Show
          Todd Lipcon added a comment - I agree that it it is safer to ignore it if the gen-stamp AND length match rather than mark the replica as corrupt. It's quite likely that the length will be shorter (eg because the client kept writing after the block report was generated) The Op_Add contains the block location ( I don't think it contains the block location). Right, it doesn't include the block location. the AddBlock from the DN arrive at the NNB shortly after the Block report containing RBW. Yep, but I don't think that causes it to lose its "corrupt" status. I suppose we could add a special flag to the corrupt block entry saying "corrupt due to state" rather than "corrupt due to checksum errors", and if we receive a correct-state reported block for a "corrupt due to state" block, we unflag the corruption. Does that solution seem preferable to you?
          Hide
          Todd Lipcon added a comment -

          The downside of this other idea is that we might see corrupt blocks mentioned frequently in the log and transiently in the web UI if this happens much in practice. So, unless there's something specifically wrong with the current patch, I think it's the preferable solution.

          Show
          Todd Lipcon added a comment - The downside of this other idea is that we might see corrupt blocks mentioned frequently in the log and transiently in the web UI if this happens much in practice. So, unless there's something specifically wrong with the current patch, I think it's the preferable solution.
          Hide
          Eli Collins added a comment -

          I think the current patch is preferable, am +1 on it. Though per HDFS-2742 I think we're overloading the notion of a corrupt block, eg it doesn't seem like receiving a RBR for a complete block implies that block is corrupt either.

          Show
          Eli Collins added a comment - I think the current patch is preferable, am +1 on it. Though per HDFS-2742 I think we're overloading the notion of a corrupt block, eg it doesn't seem like receiving a RBR for a complete block implies that block is corrupt either.
          Hide
          Eli Collins added a comment -

          Same patch, updated on tip of branch.

          Show
          Eli Collins added a comment - Same patch, updated on tip of branch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12511906/hdfs-2791.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1811//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511906/hdfs-2791.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1811//console This message is automatically generated.
          Hide
          Sanjay Radia added a comment -

          +1 for the current patch.
          I am not happy about the HA case (ie 1 replica and having to wait for the next full BR).
          But lets open another jira to explore the HA case to see how we can do better.

          Show
          Sanjay Radia added a comment - +1 for the current patch. I am not happy about the HA case (ie 1 replica and having to wait for the next full BR). But lets open another jira to explore the HA case to see how we can do better.
          Hide
          Sanjay Radia added a comment -

          Eli: >... overloading the notion of a corrupt block, eg it doesn't seem like receiving a RBR for a complete block implies that block is corrupt
          Eli this patch ignores a RBW with same generation stamp but marks RWR as corrupt. Do you mean RWR should not be marked as corrupt (i.e. a typo s^RBR^RWR^).

          Show
          Sanjay Radia added a comment - Eli: >... overloading the notion of a corrupt block, eg it doesn't seem like receiving a RBR for a complete block implies that block is corrupt Eli this patch ignores a RBW with same generation stamp but marks RWR as corrupt. Do you mean RWR should not be marked as corrupt (i.e. a typo s^RBR^RWR^).
          Hide
          Sanjay Radia added a comment -

          After reading HDFS-2742 I am coming to the conclusion that when a NN asks a DN to delete a replica, in addition to the bid and generation stamp, it should also include the state (RBW etc) known to the NN. The block is deleted only if the it is in that state. I think it will catch some of the race conditions and prevent a finalized replica being incorrectly deleted.

          If folks think this is a good idea I will file a new Jira and make that change.

          Show
          Sanjay Radia added a comment - After reading HDFS-2742 I am coming to the conclusion that when a NN asks a DN to delete a replica, in addition to the bid and generation stamp, it should also include the state (RBW etc) known to the NN. The block is deleted only if the it is in that state. I think it will catch some of the race conditions and prevent a finalized replica being incorrectly deleted. If folks think this is a good idea I will file a new Jira and make that change.
          Hide
          Todd Lipcon added a comment -

          I am coming to the conclusion that when a NN asks a DN to delete a replica, in addition to the bid and generation stamp, it should also include the state (RBW etc) known to the NN. The block is deleted only if the it is in that state.

          Good idea - I like this safeguard. But given that there are +1s on this patch here, I dont think the above safeguard is mutually exclusive either. So let's do both for extra safety.

          Assuming this patch still applies, I'll commit it momentarily.

          Show
          Todd Lipcon added a comment - I am coming to the conclusion that when a NN asks a DN to delete a replica, in addition to the bid and generation stamp, it should also include the state (RBW etc) known to the NN. The block is deleted only if the it is in that state. Good idea - I like this safeguard. But given that there are +1s on this patch here, I dont think the above safeguard is mutually exclusive either. So let's do both for extra safety. Assuming this patch still applies, I'll commit it momentarily.
          Hide
          Todd Lipcon added a comment -

          Committed to branch-23 and trunk.
          The branch-23 patch required a few small changes since protobuf RPC isn't merged:

          • instead of using a spy on the bpNamenode object, it uses a mock with DelegateAnswer as the default answer implementation
          • changed types from the translator implementation to the straight DatanodeProtocol

          Since the changes were simple and confined to test code, I made sure the new test passed and committed.

          Show
          Todd Lipcon added a comment - Committed to branch-23 and trunk. The branch-23 patch required a few small changes since protobuf RPC isn't merged: instead of using a spy on the bpNamenode object, it uses a mock with DelegateAnswer as the default answer implementation changed types from the translator implementation to the straight DatanodeProtocol Since the changes were simple and confined to test code, I made sure the new test passed and committed.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-0.23-Commit #439 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/439/)
          HDFS-2791. If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236944
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-0.23-Commit #439 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/439/ ) HDFS-2791 . If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236944 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Commit #430 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/430/)
          HDFS-2791. If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236944
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #430 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/430/ ) HDFS-2791 . If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236944 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1609 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1609/)
          HDFS-2791. If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236945
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1609 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1609/ ) HDFS-2791 . If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236945 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1681 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1681/)
          HDFS-2791. If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236945
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1681 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1681/ ) HDFS-2791 . If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236945 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1625 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1625/)
          HDFS-2791. If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236945
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1625 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1625/ ) HDFS-2791 . If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236945 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Commit #454 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/454/)
          HDFS-2791. If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236944
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #454 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/454/ ) HDFS-2791 . If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236944 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #939 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/939/)
          HDFS-2791. If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236945
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #939 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/939/ ) HDFS-2791 . If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236945 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Build #174 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/174/)
          HDFS-2791. If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236944
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Build #174 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/174/ ) HDFS-2791 . If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236944 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #152 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/152/)
          HDFS-2791. If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236944
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #152 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/152/ ) HDFS-2791 . If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236944 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #972 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/972/)
          HDFS-2791. If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236945
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #972 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/972/ ) HDFS-2791 . If block report races with closing of file, replica is incorrectly marked corrupt. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236945 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/AppendTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeAdapter.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
          Hide
          Eli Collins added a comment -

          Eli: >... overloading the notion of a corrupt block, eg it doesn't seem like receiving a RBR for a complete block implies that block is corrupt

          Eli this patch ignores a RBW with same generation stamp but marks RWR as corrupt. Do you mean RWR should not be marked as corrupt (i.e. a typo s^RBR^RWR^).

          Sanjay: correct, made a typo, s/RBR/RWR/. I meant that we are considering RWR blocks corrupt, even though a complete block under recovery is not necessarily corrupt right? Ie we're not marking RWR blocks corrupt here because we think they are actually corrupt, but because we want them to be deleted. This is a separate issue from this patch since it does not change the behavior for the RWR case, however I mentioned it here because this is an example of where considering blocks corrupt that are not actually corrupt bit us. Make sense?

          Show
          Eli Collins added a comment - Eli: >... overloading the notion of a corrupt block, eg it doesn't seem like receiving a RBR for a complete block implies that block is corrupt Eli this patch ignores a RBW with same generation stamp but marks RWR as corrupt. Do you mean RWR should not be marked as corrupt (i.e. a typo s^RBR^RWR^). Sanjay: correct, made a typo, s/RBR/RWR/. I meant that we are considering RWR blocks corrupt, even though a complete block under recovery is not necessarily corrupt right? Ie we're not marking RWR blocks corrupt here because we think they are actually corrupt, but because we want them to be deleted. This is a separate issue from this patch since it does not change the behavior for the RWR case, however I mentioned it here because this is an example of where considering blocks corrupt that are not actually corrupt bit us. Make sense?

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development