Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-140

When a file is deleted, its blocks remain in the blocksmap till the next block report from Datanode

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.20.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When a file is deleted, the namenode sends out block deletions messages to the appropriate datanodes. However, the namenode does not delete these blocks from the blocksmap. Instead, the processing of the next block report from the datanode causes these blocks to get removed from the blocksmap.

      If we desire to make block report processing less frequent, this issue needs to be addressed. Also, this introduces indeterministic behaviout to a a few unit tests. Another factor to consider is to ensure that duplicate block detection is not compromised.

        Issue Links

          Activity

          Hide
          Uma Maheswara Rao G added a comment -

          Updated the patch for 20Security205 branch!
          Note: TestDFSRemove has been back ported for more coverage related to deletes.

          Thanks
          Uma

          Show
          Uma Maheswara Rao G added a comment - Updated the patch for 20Security205 branch! Note: TestDFSRemove has been back ported for more coverage related to deletes. Thanks Uma
          Hide
          Todd Lipcon added a comment -

          Hi Uma. Is this already resolved in trunk? Having trouble following the various JIRAs (many seem to be resolved as dup). Which JIRA fixed it in trunk, and what's the motivation to fix in the maintenance release?

          I haven't looked at the patch yet, but seems like a potentially destabilizing change to introduce if there aren't any burning issues that are caused by the lack of this improvement.

          Show
          Todd Lipcon added a comment - Hi Uma. Is this already resolved in trunk? Having trouble following the various JIRAs (many seem to be resolved as dup). Which JIRA fixed it in trunk, and what's the motivation to fix in the maintenance release? I haven't looked at the patch yet, but seems like a potentially destabilizing change to introduce if there aren't any burning issues that are caused by the lack of this improvement.
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Todd,

          Thanks a lot for taking a look!

          I have verified this in trunk codebase. this issue has been addressed already.

           public void removeBlock(Block block) {
              block.setNumBytes(BlockCommand.NO_ACK);
              addToInvalidates(block);
              corruptReplicas.removeFromCorruptReplicasMap(block);
              blocksMap.removeBlock(block);
            }
          

          FsNameSystem#removePathAndBlock will invoke this API. This is making sure that, block has been removed from blocksMap.
          It looks to me that 0.21 onwards, this issue has been fixed. But i did not get exact JIRA specific to this problem.

          But I am using 20.2 version. When i profiled, i found that blockMap elements are growing. Observation is that, after blockreports, that has been come down.

          So,By looking at the code, current versions in 20X( 20.205,20.206 ) also has this problem.

          Thanks
          Uma

          Show
          Uma Maheswara Rao G added a comment - Hi Todd, Thanks a lot for taking a look! I have verified this in trunk codebase. this issue has been addressed already. public void removeBlock(Block block) { block.setNumBytes(BlockCommand.NO_ACK); addToInvalidates(block); corruptReplicas.removeFromCorruptReplicasMap(block); blocksMap.removeBlock(block); } FsNameSystem#removePathAndBlock will invoke this API. This is making sure that, block has been removed from blocksMap. It looks to me that 0.21 onwards, this issue has been fixed. But i did not get exact JIRA specific to this problem. But I am using 20.2 version. When i profiled, i found that blockMap elements are growing. Observation is that, after blockreports, that has been come down. So,By looking at the code, current versions in 20X( 20.205,20.206 ) also has this problem. Thanks Uma
          Hide
          Todd Lipcon added a comment -

          Thanks for digging to see that this is fixed in trunk. So my question remains: though this is demonstrably a problem in the 20 series, is it causing any production issues? Since 20x a maintenance release series, I think we need some good justification that it's causing a production issue somewhere.

          Do others agree or am I being too paranoid? Dhruba/Nicholas?

          Show
          Todd Lipcon added a comment - Thanks for digging to see that this is fixed in trunk. So my question remains: though this is demonstrably a problem in the 20 series, is it causing any production issues? Since 20x a maintenance release series, I think we need some good justification that it's causing a production issue somewhere. Do others agree or am I being too paranoid? Dhruba/Nicholas?
          Hide
          Suresh Srinivas added a comment -

          I agree with Todd on this. This does not cause any production issue. We should skip this for 2xx.

          Show
          Suresh Srinivas added a comment - I agree with Todd on this. This does not cause any production issue. We should skip this for 2xx.
          Hide
          dhruba borthakur added a comment -

          I would like to agree with Todd too. Uma: do you have a use-case why you definitely need this in 0.20?

          Show
          dhruba borthakur added a comment - I would like to agree with Todd too. Uma: do you have a use-case why you definitely need this in 0.20?
          Hide
          Uma Maheswara Rao G added a comment -

          Hi,

          I can see only the below problems
          1) memory consumption in NN (if we have many deletes and creates).
          2) block information in UI will not be reflected immediately. But it is not very serious problems.

          If we have confidence, then only we can push this . But in our internal branch we have fixed it in 6months back (i remember). I don't see any problems with this fix so far.

          Thanks a lot for your time to check this issue.

          Thanks
          Uma

          Show
          Uma Maheswara Rao G added a comment - Hi, I can see only the below problems 1) memory consumption in NN (if we have many deletes and creates). 2) block information in UI will not be reflected immediately. But it is not very serious problems. If we have confidence, then only we can push this . But in our internal branch we have fixed it in 6months back (i remember). I don't see any problems with this fix so far. Thanks a lot for your time to check this issue. Thanks Uma
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Dhruba,
          I am not very much insisting to push this. But when i am going through some HA issue HDFS-1972, I have seen one of the comment where you have mentioned one link

          chooseExcessReplicates() does not really need the FSNamesystem lock. We have done this just to increase scalability: http://bit.ly/rUDVui

          While going through that file(FSNameSystem) i found, removePathAndBlocks is removing block from blocksMap.

          }          blocksMap.removeBlock(b);    }
          

          This patch also does same here. Since you have tested with very big clusters, you can review the change once and comment?

          Thanks
          Uma

          Show
          Uma Maheswara Rao G added a comment - Hi Dhruba, I am not very much insisting to push this. But when i am going through some HA issue HDFS-1972 , I have seen one of the comment where you have mentioned one link chooseExcessReplicates() does not really need the FSNamesystem lock. We have done this just to increase scalability: http://bit.ly/rUDVui While going through that file(FSNameSystem) i found, removePathAndBlocks is removing block from blocksMap. } blocksMap.removeBlock(b); } This patch also does same here. Since you have tested with very big clusters, you can review the change once and comment? Thanks Uma
          Hide
          dhruba borthakur added a comment -

          hi uma, your technical point makes sense, but my feeling is that it is too late to roll it into 0.20 release. It is already fixed in newer releases, so new users will automatically get this fix. for people who are stuck with older 0.20 based releases, they can pull this patch into their code base in their own, is it not?

          Show
          dhruba borthakur added a comment - hi uma, your technical point makes sense, but my feeling is that it is too late to roll it into 0.20 release. It is already fixed in newer releases, so new users will automatically get this fix. for people who are stuck with older 0.20 based releases, they can pull this patch into their code base in their own, is it not?
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Druba, I completely agreed with you. . & Thanks a lot for your time.

          for people who are stuck with older 0.20 based releases, they can pull this patch into their code base in their own, is it not?

          I am assuming that you have taken a look into the patch here and reviewd it. If not, you can just update your comments. At least for the people whoever want to use, they will have a confidence that this is reviewd patch.

          Thanks
          Uma

          Show
          Uma Maheswara Rao G added a comment - Hi Druba, I completely agreed with you. . & Thanks a lot for your time. for people who are stuck with older 0.20 based releases, they can pull this patch into their code base in their own, is it not? I am assuming that you have taken a look into the patch here and reviewd it. If not, you can just update your comments. At least for the people whoever want to use, they will have a confidence that this is reviewd patch. Thanks Uma
          Hide
          Uma Maheswara Rao G added a comment -

          As this is a improvement and not a serious issue for 1.X versions, I am marking it as wont fix.
          Also conformed that this issue is not there in trunk.

          Show
          Uma Maheswara Rao G added a comment - As this is a improvement and not a serious issue for 1.X versions, I am marking it as wont fix. Also conformed that this issue is not there in trunk.

            People

            • Assignee:
              Uma Maheswara Rao G
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development