Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-381

Datanode should report deletion of blocks to Namenode explicitly

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently datanode notifies namenode newly added blocks and the blocks that are corrupt. There is no explicit message from the datanode to the namenode to indicate the deletion of blocks. Block reports from the datanode is the only way for the namenode to learn about the deletion of blocks at a datanode. With the addition of explicit request to indicate to block deletion, block report interval (which is currently 1 hour) can be increased to a longer duration. This reduces load on both namenode and datanodes.

      1. blockdel.patch
        24 kB
        Suresh Srinivas
      2. blockdel.patch
        24 kB
        Suresh Srinivas
      3. blockdel-1.patch
        3 kB
        Suresh Srinivas
      4. blockdel-2.patch
        4 kB
        Suresh Srinivas
      5. HDFS-381.patch
        4 kB
        Raghu Angadi

        Activity

        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Raghu Angadi made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Raghu Angadi added a comment -

        I just committed this. Thanks Suresh.

        Show
        Raghu Angadi added a comment - I just committed this. Thanks Suresh.
        Raghu Angadi made changes -
        Fix Version/s 0.21.0 [ 12314046 ]
        Raghu Angadi made changes -
        Attachment HDFS-381.patch [ 12411459 ]
        Hide
        Raghu Angadi added a comment -

        patch for hdfs trunk.

        Show
        Raghu Angadi added a comment - patch for hdfs trunk.
        Owen O'Malley made changes -
        Project Hadoop Common [ 12310240 ] HDFS [ 12310942 ]
        Key HADOOP-5724 HDFS-381
        Hadoop Flags [Reviewed, Incompatible change]
        Release Note * When a file is deleted, the corresponding blocks from the blocksMap on Namenode is also deleted
        * Block reports from datanode was a way to indicate deletion of block to the namenode. With this change namenode no longer depends on the block report. Hence reducing the frequency of block reports by increasing the block report interval from current 1 hour to 6 hours.
        Component/s dfs [ 12310710 ]
        Fix Version/s 0.21.0 [ 12313563 ]
        Hide
        Suresh Srinivas added a comment -

        Failed test org.apache.hadoop.hdfsproxy.TestHdfsProxy.testHdfsProxyInterface is not related to this change. No tests are included as many tests already test the file deletion and corresponding block deletion. Also ensured by running manual tests (see previous comments) the functionality works correctly.

        Show
        Suresh Srinivas added a comment - Failed test org.apache.hadoop.hdfsproxy.TestHdfsProxy.testHdfsProxyInterface is not related to this change. No tests are included as many tests already test the file deletion and corresponding block deletion. Also ensured by running manual tests (see previous comments) the functionality works correctly.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12410734/blockdel-2.patch
        against trunk revision 786278.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/534/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/534/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/534/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/534/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12410734/blockdel-2.patch against trunk revision 786278. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/534/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/534/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/534/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/534/console This message is automatically generated.
        Suresh Srinivas made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Release Note * When a file is deleted, the corresponding blocks from the blocksMap on Namenode is also deleted
        * Block reports from datanode was a way to indicate deletion of block to the namenode. With this change namenode no longer depends on the block report. Hence reducing the frequency of block reports by increasing the block report interval from current 1 hour to 6 hours.
        Raghu Angadi made changes -
        Hadoop Flags [Incompatible change] [Incompatible change, Reviewed]
        Issue Type Bug [ 1 ] Improvement [ 4 ]
        Component/s dfs [ 12310710 ]
        Hide
        Raghu Angadi added a comment -

        +1 for the patch.

        Show
        Raghu Angadi added a comment - +1 for the patch.
        Hide
        Suresh Srinivas added a comment -

        I also ran the following manual test:

        1. Setup HDFS with a long block replication timeout dfs.blockreport.intervalMsec
        2. Create a bunch of files in multiple directories
        3. Delete one directory at a time and ensure on the NameNode web UI, the block count in Cluster Summary decreases
        Show
        Suresh Srinivas added a comment - I also ran the following manual test: Setup HDFS with a long block replication timeout dfs.blockreport.intervalMsec Create a bunch of files in multiple directories Delete one directory at a time and ensure on the NameNode web UI, the block count in Cluster Summary decreases
        Suresh Srinivas made changes -
        Attachment blockdel-2.patch [ 12410734 ]
        Hide
        Suresh Srinivas added a comment -

        New patch with suggested changes. Also I understood your comment about checking for null for BlockInfo and have added the check.

        Patch passes all the unit tests except TestJobHistory (tracked in HADOOP-5920).

        No new tests are added as the existing ones test the functionality already. Here is the test-patch result:

             [exec] -1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
             [exec]                         Please justify why no tests are needed for this patch.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        
        Show
        Suresh Srinivas added a comment - New patch with suggested changes. Also I understood your comment about checking for null for BlockInfo and have added the check. Patch passes all the unit tests except TestJobHistory (tracked in HADOOP-5920 ). No new tests are added as the existing ones test the functionality already. Here is the test-patch result: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Hide
        Suresh Srinivas added a comment -

        Raghu based on your comment, removeINode() is removed and removeBlockFromMap() is used instead.

        depending what we do with removeINode(), it may not be necessary to change removeBlockFromMap(). The new implementation does a look up but does not check if 'info' is null.

        I am not sure what you mean here. I changed removeBlockFromMap() to take Block instead of BlockInfo to make it a more generic method to work with both Block and BlockInfo. Checking for null is done in BlocksMap.removeBlock(). As regards to lookup, BlockMap.removeBlock() any way has to call remove() on HashMap which does a look up. I just optimized it such that remove is used for lookup and removing the entry from the map.

        Show
        Suresh Srinivas added a comment - Raghu based on your comment, removeINode() is removed and removeBlockFromMap() is used instead. depending what we do with removeINode(), it may not be necessary to change removeBlockFromMap(). The new implementation does a look up but does not check if 'info' is null. I am not sure what you mean here. I changed removeBlockFromMap() to take Block instead of BlockInfo to make it a more generic method to work with both Block and BlockInfo. Checking for null is done in BlocksMap.removeBlock(). As regards to lookup, BlockMap.removeBlock() any way has to call remove() on HashMap which does a look up. I just optimized it such that remove is used for lookup and removing the entry from the map.
        Hide
        Raghu Angadi added a comment -

        Couple of comments:

        • To enforce the new policy, it is probably better to call removeBlockFromMap() from removeINode().
        • Or may be we don't need both?
        • depending what we do with removeINode(), it may not be necessary to change removeBlockFromMap(). The new implementation does a look up but does not check if 'info' is null.
        Show
        Raghu Angadi added a comment - Couple of comments: To enforce the new policy, it is probably better to call removeBlockFromMap() from removeINode(). Or may be we don't need both? depending what we do with removeINode(), it may not be necessary to change removeBlockFromMap(). The new implementation does a look up but does not check if 'info' is null.
        Suresh Srinivas made changes -
        Attachment blockdel-1.patch [ 12410235 ]
        Hide
        Suresh Srinivas added a comment -

        Attached patch has the following changes:

        1. When a file is deleted, the blocks for that file is removed from BlocksMap and is added to recentInvalidateSets
        2. Changed the block report duration from 1 hour to 6 hours

        Patch passes all the unit-tests.

        Show
        Suresh Srinivas added a comment - Attached patch has the following changes: When a file is deleted, the blocks for that file is removed from BlocksMap and is added to recentInvalidateSets Changed the block report duration from 1 hour to 6 hours Patch passes all the unit-tests.
        Hide
        Raghu Angadi added a comment -

        +1 for the proposal.

        Does not matter if it is this jira or a new one.

        Show
        Raghu Angadi added a comment - +1 for the proposal. Does not matter if it is this jira or a new one.
        Hide
        Suresh Srinivas added a comment -

        I propose the following:

        • When a file is deleted, remove the blocks that belong to the file from BlockManager.blocksMap and add it to BlockManager.recentInvalidateSets. Any subsequent attempt at adding a block (say triggered by block report) will fail, since the corresponding file does not exist. Also lingering blocks in datanodes will not cause issues since two block IDs generated at different times will be the different due to generation stamp. I will create a new jira for making this change.
        • Close this jira since the issue is no longer valid
        Show
        Suresh Srinivas added a comment - I propose the following: When a file is deleted, remove the blocks that belong to the file from BlockManager.blocksMap and add it to BlockManager.recentInvalidateSets . Any subsequent attempt at adding a block (say triggered by block report) will fail, since the corresponding file does not exist. Also lingering blocks in datanodes will not cause issues since two block IDs generated at different times will be the different due to generation stamp. I will create a new jira for making this change. Close this jira since the issue is no longer valid
        Hide
        Suresh Srinivas added a comment -

        Raghu, thanks for correcting my misunderstanding.

        I saw that fsnamesystem puts blocks in the deleted file into invalidate blocks set in the end of file deletion.

        Blocks in invalidate set can be used for notifying datanodes to delete the blocks. However, we can clear the blocksMap entry and the block in DatanodeDescriptor without having to wait for deletion confirmation from the datanode right?

        Show
        Suresh Srinivas added a comment - Raghu, thanks for correcting my misunderstanding. I saw that fsnamesystem puts blocks in the deleted file into invalidate blocks set in the end of file deletion. Blocks in invalidate set can be used for notifying datanodes to delete the blocks. However, we can clear the blocksMap entry and the block in DatanodeDescriptor without having to wait for deletion confirmation from the datanode right?
        Hide
        Hairong Kuang added a comment -

        I took a look at the code again. I saw that fsnamesystem puts blocks in the deleted file into invalidate blocks set in the end of file deletion.

        > if NN indeed deleted the blocks when the files are deleted, there is not much motivation/need for this jira.
        This is the reason that we are holding off this patch.

        Show
        Hairong Kuang added a comment - I took a look at the code again. I saw that fsnamesystem puts blocks in the deleted file into invalidate blocks set in the end of file deletion. > if NN indeed deleted the blocks when the files are deleted, there is not much motivation/need for this jira. This is the reason that we are holding off this patch.
        Hide
        Raghu Angadi added a comment -

        Also, if NN indeed deleted the blocks when the files are deleted, there is not much motivation/need for this jira.

        Show
        Raghu Angadi added a comment - Also, if NN indeed deleted the blocks when the files are deleted, there is not much motivation/need for this jira.
        Hide
        Raghu Angadi added a comment -

        >> I don't think so... the blocks are removed from the datanode list only after the blockreport. from what I can see, could be mistaken.
        > Block is indeed deleted from the blocksMap when the file is deleted.

        A simple test on trunk shows otherwise. On a clean cluster with one datanode:

        1. create a file
        2. Check that http://namenode:50070/dfsnodelist.jsp?whatNodes=LIVE shows one block
        3. Delete the file
        4. Check the datanode list page again. The block is still counted.
        5. You will see that count goes to zero only after a block report from the datanode.
        Show
        Raghu Angadi added a comment - >> I don't think so... the blocks are removed from the datanode list only after the blockreport. from what I can see, could be mistaken. > Block is indeed deleted from the blocksMap when the file is deleted. A simple test on trunk shows otherwise. On a clean cluster with one datanode: create a file Check that http://namenode:50070/dfsnodelist.jsp?whatNodes=LIVE shows one block Delete the file Check the datanode list page again. The block is still counted. You will see that count goes to zero only after a block report from the datanode.
        Hide
        Hairong Kuang added a comment -

        > any reason why the NN does not remove a replica from the block map at the time of sending the block-delete request to the datanode when it is trying to reduce over-replication?

        One more reason: because there is a delay between scheduling a replica to remove and the replica getting removed at a datanode, keeping the replica in blocksMap could avoid placing the block on this datanode if ever an under-replication happens to occur. That why we had HADOOP-4643 instead of HADOOP-4540.

        Show
        Hairong Kuang added a comment - > any reason why the NN does not remove a replica from the block map at the time of sending the block-delete request to the datanode when it is trying to reduce over-replication? One more reason: because there is a delay between scheduling a replica to remove and the replica getting removed at a datanode, keeping the replica in blocksMap could avoid placing the block on this datanode if ever an under-replication happens to occur. That why we had HADOOP-4643 instead of HADOOP-4540 .
        Hide
        Suresh Srinivas added a comment -

        I don't think so... the blocks are removed from the datanode list only after the blockreport. from what I can see, could be mistaken.

        Block is indeed deleted from the blocksMap when the file is deleted. The reason why it works in this case is - if the next block report from the datanode includes the deleted block, since the corresponding file is not found, the block is not added to the blocksMap.

        To summarize the difference:
        When a file is deleted, blocks can be deleted from blocksMap. Non existence file serves as a reminder that the block is no longer valid. However when a block gets deleted from a datanode (and the file still exists), there is nothing on the namenode that indicates that the block has been deleted from the datanode.

        Show
        Suresh Srinivas added a comment - I don't think so... the blocks are removed from the datanode list only after the blockreport. from what I can see, could be mistaken. Block is indeed deleted from the blocksMap when the file is deleted. The reason why it works in this case is - if the next block report from the datanode includes the deleted block, since the corresponding file is not found, the block is not added to the blocksMap. To summarize the difference: When a file is deleted, blocks can be deleted from blocksMap. Non existence file serves as a reminder that the block is no longer valid. However when a block gets deleted from a datanode (and the file still exists), there is nothing on the namenode that indicates that the block has been deleted from the datanode.
        Hide
        Raghu Angadi added a comment -

        One reason could be because if there is a block report that is sent by DN before seeing the deletion request, and seen by NN after sending the deletion request, that block will be added again to the datanode. There might be other related issues. Unless these are all sorted out, it is better for block list for a DN at NN to reflect what DN says it is.

        > But in case of block deletion due to file deletion etc., blocks are removed from blocksMap immediately.

        I don't think so... the blocks are removed from the datanode list only after the blockreport. from what I can see, could be mistaken. Otherwise, when a file is deleted, a block could be safely deleted from the datanode map. Does getstamp handle all the case where NN might reuse the same block id before datanodes delete the prev incarnations?

        Also the extra overhead for reporting the deleted blocks is more than compensated by less frequent block reports.

        Show
        Raghu Angadi added a comment - One reason could be because if there is a block report that is sent by DN before seeing the deletion request, and seen by NN after sending the deletion request, that block will be added again to the datanode. There might be other related issues. Unless these are all sorted out, it is better for block list for a DN at NN to reflect what DN says it is. > But in case of block deletion due to file deletion etc., blocks are removed from blocksMap immediately. I don't think so... the blocks are removed from the datanode list only after the blockreport. from what I can see, could be mistaken. Otherwise, when a file is deleted, a block could be safely deleted from the datanode map. Does getstamp handle all the case where NN might reuse the same block id before datanodes delete the prev incarnations? Also the extra overhead for reporting the deleted blocks is more than compensated by less frequent block reports.
        Hide
        dhruba borthakur added a comment -

        > NN does not remove a replica from blocksMap immediately when scheduling to delete the replica due to over-replication/

        any reason why the NN does not remove a replica from the block map at the time of sending the block-delete request to the datanode when it is trying to reduce over-replication?

        Show
        dhruba borthakur added a comment - > NN does not remove a replica from blocksMap immediately when scheduling to delete the replica due to over-replication/ any reason why the NN does not remove a replica from the block map at the time of sending the block-delete request to the datanode when it is trying to reduce over-replication?
        Hide
        Hairong Kuang added a comment -

        NN does not remove a replica from blocksMap immediately when scheduling to delete the replica due to over-replication/corruption etc. The patch helps in this case. But in case of block deletion due to file deletion etc., blocks are removed from blocksMap immediately. For this kinds of blocks, this patch introduces overhead. That's why we are holding off this patch now.

        Show
        Hairong Kuang added a comment - NN does not remove a replica from blocksMap immediately when scheduling to delete the replica due to over-replication/corruption etc. The patch helps in this case. But in case of block deletion due to file deletion etc., blocks are removed from blocksMap immediately. For this kinds of blocks, this patch introduces overhead. That's why we are holding off this patch now.
        Hide
        dhruba borthakur added a comment -

        This is mostly a case of cleaning up space in the datanode(s) right? In the current code, the NN removes the block from te blockmap before sending the delete request to the datanode(s)... essentially assuming that the block will be successfully deleted in the datanode, isn't it? Are you seeing any practical cases when this existing approach is inadequate?

        Show
        dhruba borthakur added a comment - This is mostly a case of cleaning up space in the datanode(s) right? In the current code, the NN removes the block from te blockmap before sending the delete request to the datanode(s)... essentially assuming that the block will be successfully deleted in the datanode, isn't it? Are you seeing any practical cases when this existing approach is inadequate?
        Suresh Srinivas made changes -
        Attachment blockdel.patch [ 12406257 ]
        Suresh Srinivas made changes -
        Field Original Value New Value
        Attachment blockdel.patch [ 12406168 ]
        Suresh Srinivas created issue -

          People

          • Assignee:
            Suresh Srinivas
            Reporter:
            Suresh Srinivas
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development