Hadoop Common
  1. Hadoop Common
  2. HADOOP-5134

FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.2
    • Fix Version/s: 0.18.4, 0.19.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.

      However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead.

      1. commitBlockSync3.patch
        4 kB
        dhruba borthakur
      2. commitBlockSync2.patch
        6 kB
        dhruba borthakur
      3. commitBlockSync-0.18.patch
        3 kB
        dhruba borthakur
      4. commitBlockSync.patch
        4 kB
        dhruba borthakur

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk #758 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/758/)
        . FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap. Contributed by Dhruba Borthakur.

        Show
        Hudson added a comment - Integrated in Hadoop-trunk #758 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/758/ ) . FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap. Contributed by Dhruba Borthakur.
        Hide
        Hairong Kuang added a comment -

        I've just committed this. Thanks, Dhruba!

        Show
        Hairong Kuang added a comment - I've just committed this. Thanks, Dhruba!
        Hide
        dhruba borthakur added a comment -

        Patch for 0.18 branch.

        Show
        dhruba borthakur added a comment - Patch for 0.18 branch.
        Hide
        Hairong Kuang added a comment -

        Dhruba, could you please upload a patch for 0.18?

        Show
        Hairong Kuang added a comment - Dhruba, could you please upload a patch for 0.18?
        Hide
        Hairong Kuang added a comment -

        +1. The patch looks good to me.

        Show
        Hairong Kuang added a comment - +1. The patch looks good to me.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12400220/commitBlockSync3.patch
        against trunk revision 744406.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3866/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3866/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3866/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3866/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12400220/commitBlockSync3.patch against trunk revision 744406. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3866/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3866/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3866/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3866/console This message is automatically generated.
        Hide
        dhruba borthakur added a comment -

        Ok, removed checking the data-crc lengths. Wull do it as a separate JIRA

        Show
        dhruba borthakur added a comment - Ok, removed checking the data-crc lengths. Wull do it as a separate JIRA
        Hide
        Hairong Kuang added a comment -

        > I also added a check that is triggered during lease recovery that matches the crc file with the data file (in length) before finalizing a block.
        Checking the length seems not good enough. We should verify checksum for the last chunk in the block. Let's open a separate jira for this problem. Also in the code
        if (f.length() > maxDataSize || f.length() < minDataSize) should be if (f.length() > maxDataSize || f.length() <= minDataSize).

        Show
        Hairong Kuang added a comment - > I also added a check that is triggered during lease recovery that matches the crc file with the data file (in length) before finalizing a block. Checking the length seems not good enough. We should verify checksum for the last chunk in the block. Let's open a separate jira for this problem. Also in the code if (f.length() > maxDataSize || f.length() < minDataSize) should be if (f.length() > maxDataSize || f.length() <= minDataSize).
        Hide
        dhruba borthakur added a comment -

        Cleaned up the code as suggested by Hairong. I also added a check that is triggered during lease recovery that matches the crc file with the data file (in length) before finalizing a block. This ensures that the data in the block file is good.

        Show
        dhruba borthakur added a comment - Cleaned up the code as suggested by Hairong. I also added a check that is triggered during lease recovery that matches the crc file with the data file (in length) before finalizing a block. This ensures that the data in the block file is good.
        Hide
        Hairong Kuang added a comment -

        After much thought, I'd like to +1 on the approach in this patch as an emergency fix to 0.18 and 0.19, but leaving the discussion of lease close semantics and block length inconsistency handling to a later time.

        Comments to the patch:
        lines 1897-1900 in the merged FSNamestem.java: why descriptors is not declared in the else block. It seems that descriptors is used only there.
        line 1915: no need to initiate descriptors again
        line 1916: not sure if ">=" is needed. trunk uses only ">".

        Could you please also upload a patch for 0.18?

        Show
        Hairong Kuang added a comment - After much thought, I'd like to +1 on the approach in this patch as an emergency fix to 0.18 and 0.19, but leaving the discussion of lease close semantics and block length inconsistency handling to a later time. Comments to the patch: lines 1897-1900 in the merged FSNamestem.java: why descriptors is not declared in the else block. It seems that descriptors is used only there. line 1915: no need to initiate descriptors again line 1916: not sure if ">=" is needed. trunk uses only ">". Could you please also upload a patch for 0.18?
        Hide
        Hairong Kuang added a comment -

        What if a blockReceived has received before commitBlockSync()? Should commitBlockSync() check and handle inconsistent block length? I do not feel comfortable about this patch because it breaks the rule that we set in HADOOP-5133.

        Show
        Hairong Kuang added a comment - What if a blockReceived has received before commitBlockSync()? Should commitBlockSync() check and handle inconsistent block length? I do not feel comfortable about this patch because it breaks the rule that we set in HADOOP-5133 .
        Hide
        dhruba borthakur added a comment -

        > But how about lease recovery when lease expires?

        At this time, it should be safe to update the blocksMap. Since the file still has a lease when the replicas are added to the blocksMap (via commitBlockSync), the NN should not check/detect corrupt replicas.

        HADOOP-5133 caused a problem because the client wrote more data to the block after the first call to commitBlockSync(). However, when the NN triggers lease recovery, the client cannot write any more data to the block. Thus, in this case, it is safe for the NN to insert the replica in the blocksMap.

        Show
        dhruba borthakur added a comment - > But how about lease recovery when lease expires? At this time, it should be safe to update the blocksMap. Since the file still has a lease when the replicas are added to the blocksMap (via commitBlockSync), the NN should not check/detect corrupt replicas. HADOOP-5133 caused a problem because the client wrote more data to the block after the first call to commitBlockSync(). However, when the NN triggers lease recovery, the client cannot write any more data to the block. Thus, in this case, it is safe for the NN to insert the replica in the blocksMap.
        Hide
        Hairong Kuang added a comment -

        But how about lease recovery when lease expires? This is part of 0.18 even nobody invokes sync or append.

        Show
        Hairong Kuang added a comment - But how about lease recovery when lease expires? This is part of 0.18 even nobody invokes sync or append.
        Hide
        dhruba borthakur added a comment -

        This patch had nothing much to do with "appends", but it has mightily related to datanode(s) dropping off the write-pipeline during a write form a client.

        When a client is writing to a file and encounters an error (bad datanode in pipleine), it invokes commitBlockSync(). This patch guarantees that this call to commitBlockSyn() does not add any replicas to the blocksMap. That's what you wanted, isn't it?

        Show
        dhruba borthakur added a comment - This patch had nothing much to do with "appends", but it has mightily related to datanode(s) dropping off the write-pipeline during a write form a client. When a client is writing to a file and encounters an error (bad datanode in pipleine), it invokes commitBlockSync(). This patch guarantees that this call to commitBlockSyn() does not add any replicas to the blocksMap. That's what you wanted, isn't it?
        Hide
        Hairong Kuang added a comment -

        My point is that a lease should not be removed before all blocks of an under-construction file reaches the minimum replication factor. Whether this is part of lease recovery or not is an implementation detail.

        With your patch, is the last block still at the risk described in HADOOP-5133? If so, I would prefer to do this right instead of worrying about append. This is a patch to 0.18 no append is supported in this version.

        Show
        Hairong Kuang added a comment - My point is that a lease should not be removed before all blocks of an under-construction file reaches the minimum replication factor. Whether this is part of lease recovery or not is an implementation detail. With your patch, is the last block still at the risk described in HADOOP-5133 ? If so, I would prefer to do this right instead of worrying about append. This is a patch to 0.18 no append is supported in this version.
        Hide
        dhruba borthakur added a comment -

        This patch does the following:

        1. A commitBlockSync does not add block locations of the last block to the blocksMap, it just stores it in the INodeUnderConstruction. However, if the commitBlockSync occurs as part of lease recovery that is triggered by the NN (i.e., the file is getting closed), then it does update the blocksMap.
        2. Opening a file for append causes the last partial block of the file (if any) to move its block locations from the blocksMap to the INodeUnderConstruction.

        Show
        dhruba borthakur added a comment - This patch does the following: 1. A commitBlockSync does not add block locations of the last block to the blocksMap, it just stores it in the INodeUnderConstruction. However, if the commitBlockSync occurs as part of lease recovery that is triggered by the NN (i.e., the file is getting closed), then it does update the blocksMap. 2. Opening a file for append causes the last partial block of the file (if any) to move its block locations from the blocksMap to the INodeUnderConstruction.
        Hide
        dhruba borthakur added a comment -

        I agree that "lease recovery initiating from NN is equal to file close from a client". Are you suggesting that the commitBlockSync will fail if the last block has not reached the minimum replication factor? I think this would not work too well, because restarting the lease recovery protocol would again stamp all the valid blocks with a new gs and each datanode will again have to send the blockReceived (with the new gs to the NN). It may so happen that the re-tried commitBlockSync call again reaches the NN before the new blockReceived calls. This can cause lease recovery to not end at all. Does this make sense?

        Show
        dhruba borthakur added a comment - I agree that "lease recovery initiating from NN is equal to file close from a client". Are you suggesting that the commitBlockSync will fail if the last block has not reached the minimum replication factor? I think this would not work too well, because restarting the lease recovery protocol would again stamp all the valid blocks with a new gs and each datanode will again have to send the blockReceived (with the new gs to the NN). It may so happen that the re-tried commitBlockSync call again reaches the NN before the new blockReceived calls. This can cause lease recovery to not end at all. Does this make sense?
        Hide
        Hairong Kuang added a comment -

        From my understanding, lease recovery initiating from NN is equal to file close from a client. So a lease should not be removed before the last block reaches the minimum replication factor. Would this solve the problem?

        Show
        Hairong Kuang added a comment - From my understanding, lease recovery initiating from NN is equal to file close from a client. So a lease should not be removed before the last block reaches the minimum replication factor. Would this solve the problem?
        Hide
        dhruba borthakur added a comment -

        At first I thought that the fix is simple. commitBlockSync should not update the blocksMap. But this could cause a subtle problem. The problem is the above fix could cause a call to "append" fail when it should not. let me try to explain.

        Suppose, a writer is writing data to a file and dies before closing the file. A new writer starts and invokes "append" to try to append to this file. This "append" call triggers lease recovery, and thereby causes the Primary Datanode to invoke commitBlockSync. Meanwhle, the append call fails (as expected) with AlreadyBeingCreatedExcetion". The commitBlockSync call removes the lease but does not update block locations of the last block. The datanode(s) that have the last block sends blockReceived to the NN, but before they reach the NN, the NN starts processing another call to "append". This append will now find that the file is not under construction anymore but that the last block of the file does not have any block locations associated with it. This means that the "append" call will fail. This is not expected behaviour.

        Show
        dhruba borthakur added a comment - At first I thought that the fix is simple. commitBlockSync should not update the blocksMap. But this could cause a subtle problem. The problem is the above fix could cause a call to "append" fail when it should not. let me try to explain. Suppose, a writer is writing data to a file and dies before closing the file. A new writer starts and invokes "append" to try to append to this file. This "append" call triggers lease recovery, and thereby causes the Primary Datanode to invoke commitBlockSync. Meanwhle, the append call fails (as expected) with AlreadyBeingCreatedExcetion". The commitBlockSync call removes the lease but does not update block locations of the last block. The datanode(s) that have the last block sends blockReceived to the NN, but before they reach the NN, the NN starts processing another call to "append". This append will now find that the file is not under construction anymore but that the last block of the file does not have any block locations associated with it. This means that the "append" call will fail. This is not expected behaviour.

          People

          • Assignee:
            dhruba borthakur
            Reporter:
            Hairong Kuang
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development