Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1197

Blocks are considered "complete" prematurely after commitBlockSynchronization or DN restart

    Details

      Description

      I saw this failure once on my internal Hudson job that runs the append tests 48 times a day:
      junit.framework.AssertionFailedError: expected:<114688> but was:<98304>
      at org.apache.hadoop.hdfs.AppendTestUtil.check(AppendTestUtil.java:112)
      at org.apache.hadoop.hdfs.TestFileAppend3.testTC2(TestFileAppend3.java:116)

      1. testTC2-failure.txt
        131 kB
        Todd Lipcon
      2. hdfs-1197-test-changes.txt
        10 kB
        Todd Lipcon
      3. hdfs-1197.txt
        27 kB
        Todd Lipcon
      4. HDFS-1197-without-addStoredBlock-change.1.patch
        15 kB
        Jitendra Nath Pandey
      5. HDFS-1197-20s.2.patch
        26 kB
        Jitendra Nath Pandey
      6. HDFS-1197-20s.3.patch
        25 kB
        Jitendra Nath Pandey

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          This was committed to branch-20 a long while back

          Show
          Todd Lipcon added a comment - This was committed to branch-20 a long while back
          Hide
          Jitendra Nath Pandey added a comment -

          Committed to branch-0.20-security. Thanks to Todd.

          Show
          Jitendra Nath Pandey added a comment - Committed to branch-0.20-security. Thanks to Todd.
          Hide
          Suresh Srinivas added a comment -

          +1 for the patch.

          Show
          Suresh Srinivas added a comment - +1 for the patch.
          Hide
          Jitendra Nath Pandey added a comment -

          Incorrect patch uploaded previously.

          Show
          Jitendra Nath Pandey added a comment - Incorrect patch uploaded previously.
          Hide
          Jitendra Nath Pandey added a comment -

          Todd's original patch rebased against 20-security.

          Show
          Jitendra Nath Pandey added a comment - Todd's original patch rebased against 20-security.
          Hide
          Sanjay Radia added a comment -

          Todd, you are right.

          Show
          Sanjay Radia added a comment - Todd, you are right.
          Hide
          Todd Lipcon added a comment -

          Hi Sanjay. I'm not sure I follow your reasoning here. This patch doesn't ignore blockReceived when it beats the completeFile call in the usual case. If the blockReceived call is for the correct generation stamp, it is added to the DN usual. It only changes behavior when an old generation stamp is reported.

          Show
          Todd Lipcon added a comment - Hi Sanjay. I'm not sure I follow your reasoning here. This patch doesn't ignore blockReceived when it beats the completeFile call in the usual case. If the blockReceived call is for the correct generation stamp, it is added to the DN usual. It only changes behavior when an old generation stamp is reported.
          Hide
          Sanjay Radia added a comment -

          The original patch dealt with ignoring the blockReceived for a BBW block sent by a DN rebooting. However it, as a side effect, also ignores the blockReceived sent for a block that is finalized that races ahead of the call from the client to NN to close the block/file. Hence I think the original patch fixes a bug and introduces another.

          Jitendra observed that HDFS-1779 no longer sends the blockReceived but instead sends a bbw-report.
          Hence I think that his patch is correct. However how do you explain todd's statement that the original patch has been in production for over a year. I thought it would be quite common for a blockReceived for a finalized block to race ahead of the client's call to the NN. In this case the file would not close till the next BReport.

          Show
          Sanjay Radia added a comment - The original patch dealt with ignoring the blockReceived for a BBW block sent by a DN rebooting. However it, as a side effect, also ignores the blockReceived sent for a block that is finalized that races ahead of the call from the client to NN to close the block/file. Hence I think the original patch fixes a bug and introduces another. Jitendra observed that HDFS-1779 no longer sends the blockReceived but instead sends a bbw-report. Hence I think that his patch is correct. However how do you explain todd's statement that the original patch has been in production for over a year. I thought it would be quite common for a blockReceived for a finalized block to race ahead of the client's call to the NN. In this case the file would not close till the next BReport.
          Hide
          Jitendra Nath Pandey added a comment -

          That's a valid point.

          I am attaching a patch without the addStoredBlock change, just for the sake of comparision. The added tests pass with this patch also.

          Show
          Jitendra Nath Pandey added a comment - That's a valid point. I am attaching a patch without the addStoredBlock change, just for the sake of comparision. The added tests pass with this patch also.
          Hide
          Todd Lipcon added a comment -

          That seems to make sense. But given this patch has been in production for a year so in lots of clusters, I'm more comfortable committing the full fix. Otherwise, would have to take a lot of hours to think through the possibilities and edge cases again

          Show
          Todd Lipcon added a comment - That seems to make sense. But given this patch has been in production for a year so in lots of clusters, I'm more comfortable committing the full fix. Otherwise, would have to take a lot of hours to think through the possibilities and edge cases again
          Hide
          Jitendra Nath Pandey added a comment -

          Does HDFS-1779 partly fix this one? As per HDFS-1779, since bbw blocks are only added to targets, the problem of premature completeFile should not happen due to non-finalized blocks being added in addStoredBlock. That will leave only the case of commitBlockSynchronization in HDFS-1197 which is a simpler fix. Is this assessment correct?

          Show
          Jitendra Nath Pandey added a comment - Does HDFS-1779 partly fix this one? As per HDFS-1779 , since bbw blocks are only added to targets, the problem of premature completeFile should not happen due to non-finalized blocks being added in addStoredBlock. That will leave only the case of commitBlockSynchronization in HDFS-1197 which is a simpler fix. Is this assessment correct?
          Hide
          Todd Lipcon added a comment -

          Here's a patch, applies on top of HDFS-1186 and HDFS-1218, HDFS-1057.

          Been in heavy testing for a week or so, should be stable.

          Show
          Todd Lipcon added a comment - Here's a patch, applies on top of HDFS-1186 and HDFS-1218 , HDFS-1057 . Been in heavy testing for a week or so, should be stable.
          Hide
          dhruba borthakur added a comment -

          I am marking this for the append-branch to ensure that we investigate and follow through on this one.

          Show
          dhruba borthakur added a comment - I am marking this for the append-branch to ensure that we investigate and follow through on this one.
          Hide
          Todd Lipcon added a comment -

          Updating JIRA name to match the bug. There are several issues at hand:
          1) commitBlockSynchronization was adding the nodes to the blocks map, even if the block is the last block in an in-progress file. This happens if there is a pipeline failure, for example.
          2) addStoredBlock was adding the node to the blocks map even if the block is an earlier-GS replica of the last block in an in-progress file. This happens if one of the DNs restarts and sends its earlier-GS RWR block to the NN while the writer is still writing to the same block.

          Both of these cases cause "completeFile" to be able to finalize the INodeFileUnderConstruction before the correct length of the last block has been reported, which results in temporarily visible truncation of the appended data.

          Show
          Todd Lipcon added a comment - Updating JIRA name to match the bug. There are several issues at hand: 1) commitBlockSynchronization was adding the nodes to the blocks map, even if the block is the last block in an in-progress file. This happens if there is a pipeline failure, for example. 2) addStoredBlock was adding the node to the blocks map even if the block is an earlier-GS replica of the last block in an in-progress file. This happens if one of the DNs restarts and sends its earlier-GS RWR block to the NN while the writer is still writing to the same block. Both of these cases cause "completeFile" to be able to finalize the INodeFileUnderConstruction before the correct length of the last block has been reported, which results in temporarily visible truncation of the appended data.
          Hide
          Todd Lipcon added a comment -

          Here are some changes which cause this problem to show up very reliably in various tests. This patch adds an undocumented conf that causes the DN to delay for some millis before sending blockReceived calls, and exposes bugs where completeFile() gets called before the DNs report the new blocks.

          I have a patch that improves the situation, but still failing occasionally it seems... will run it through a couple hundred Hudson builds and see how it goes.

          Show
          Todd Lipcon added a comment - Here are some changes which cause this problem to show up very reliably in various tests. This patch adds an undocumented conf that causes the DN to delay for some millis before sending blockReceived calls, and exposes bugs where completeFile() gets called before the DNs report the new blocks. I have a patch that improves the situation, but still failing occasionally it seems... will run it through a couple hundred Hudson builds and see how it goes.
          Hide
          Todd Lipcon added a comment -

          I think this was introduced by the following part of the HDFS-200 patch:

          -        }
          -      }
          -      if (closeFile) {
          -        // the file is getting closed. Insert block locations into blocksMap.
          -        // Otherwise fsck will report these blocks as MISSING, especially if the
          -        // blocksReceived from Datanodes take a long time to arrive.
          -        for (int i = 0; i < descriptors.length; i++) {
                     descriptors[i].addBlock(newblockinfo);
                   }
          -        pendingFile.setLastBlock(newblockinfo, null);
          -      } else {
          -        // add locations into the INodeUnderConstruction
          -        pendingFile.setLastBlock(newblockinfo, descriptors);
                 }
          +      // add locations into the INodeUnderConstruction
          +      pendingFile.setLastBlock(newblockinfo, descriptors);
               }
          

          I have some unit tests to show the issue, and working on a fix. I think this should be considered a blocker for 0.20-append

          Show
          Todd Lipcon added a comment - I think this was introduced by the following part of the HDFS-200 patch: - } - } - if (closeFile) { - // the file is getting closed. Insert block locations into blocksMap. - // Otherwise fsck will report these blocks as MISSING, especially if the - // blocksReceived from Datanodes take a long time to arrive. - for ( int i = 0; i < descriptors.length; i++) { descriptors[i].addBlock(newblockinfo); } - pendingFile.setLastBlock(newblockinfo, null ); - } else { - // add locations into the INodeUnderConstruction - pendingFile.setLastBlock(newblockinfo, descriptors); } + // add locations into the INodeUnderConstruction + pendingFile.setLastBlock(newblockinfo, descriptors); } I have some unit tests to show the issue, and working on a fix. I think this should be considered a blocker for 0.20-append
          Hide
          Todd Lipcon added a comment -

          The sequence in this test was:

          • Client opens existing block for append
          • Client appends 16KB, closes pipeline
          • Client calls completeFile() which finalizes the inode
          • DN blockReceived calls arrive

          I don't think we properly handle the case in checkFileProgress to see if we actually seen finalized replicas of the newest generation stamp.

          I'll work on a test case that delays the blockReceived calls

          Show
          Todd Lipcon added a comment - The sequence in this test was: Client opens existing block for append Client appends 16KB, closes pipeline Client calls completeFile() which finalizes the inode DN blockReceived calls arrive I don't think we properly handle the case in checkFileProgress to see if we actually seen finalized replicas of the newest generation stamp. I'll work on a test case that delays the blockReceived calls
          Hide
          Todd Lipcon added a comment -

          Attaching logs from the test run.

          Show
          Todd Lipcon added a comment - Attaching logs from the test run.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development