Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6945

BlockManager should remove a block from excessReplicateMap and decrement ExcessBlocks metric when the block is removed

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 2.5.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Target Version/s:

      Description

      I'm seeing ExcessBlocks metric increases to more than 300K in some clusters, however, there are no over-replicated blocks (confirmed by fsck).

      After a further research, I noticed when deleting a block, BlockManager does not remove the block from excessReplicateMap or decrement excessBlocksCount.
      Usually the metric is decremented when processing block report, however, if the block has been deleted, BlockManager does not remove the block from excessReplicateMap or decrement the metric.
      That way the metric and excessReplicateMap can increase infinitely (i.e. memory leak can occur).

      1. HDFS-6945-005.patch
        3 kB
        Akira AJISAKA
      2. HDFS-6945-004.patch
        3 kB
        Akira AJISAKA
      3. HDFS-6945-003.patch
        3 kB
        Akira AJISAKA
      4. HDFS-6945.2.patch
        3 kB
        Akira AJISAKA
      5. HDFS-6945.patch
        3 kB
        Akira AJISAKA

        Activity

        Hide
        Akira AJISAKA added a comment -

        The number of excess blocks is incremented but not decremented in the following sequence.

        1. A block becomes over-relicated
        2. NN asks a DN to delete an excess block
        3. The DN deletes the block
        4. delete the file includes the block before receiving block report from the DN

        If the block has been deleted, the counter is not decremented in processing block report.

        Show
        Akira AJISAKA added a comment - The number of excess blocks is incremented but not decremented in the following sequence. A block becomes over-relicated NN asks a DN to delete an excess block The DN deletes the block delete the file includes the block before receiving block report from the DN If the block has been deleted, the counter is not decremented in processing block report.
        Hide
        Akira AJISAKA added a comment -

        I propose to add the function to remove the block from excessReplicateMap and decrement the counter in BlockManager#removeBlock(Block) and BlockManager#removeBlockFromMap(Block) methods.
        Now excessReplicateMap can become large, which means memory leak.

        Show
        Akira AJISAKA added a comment - I propose to add the function to remove the block from excessReplicateMap and decrement the counter in BlockManager#removeBlock(Block) and BlockManager#removeBlockFromMap(Block) methods. Now excessReplicateMap can become large, which means memory leak.
        Hide
        Akira AJISAKA added a comment -

        Attaching a patch.

        Show
        Akira AJISAKA added a comment - Attaching a patch.
        Hide
        Akira AJISAKA added a comment -

        Updated the summary and the description.

        Show
        Akira AJISAKA added a comment - Updated the summary and the description.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12664591/HDFS-6945.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The test build failed in hadoop-hdfs-project/hadoop-hdfs

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7782//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7782//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664591/HDFS-6945.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The test build failed in hadoop-hdfs-project/hadoop-hdfs +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7782//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7782//console This message is automatically generated.
        Hide
        Akira AJISAKA added a comment -

        Updated the patch to avoid ConcurrentModificationException when removing a value from TreeMap.

        Show
        Akira AJISAKA added a comment - Updated the patch to avoid ConcurrentModificationException when removing a value from TreeMap.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12665699/HDFS-6945.2.patch
        against trunk revision 258c7d0.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
        org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7864//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7864//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665699/HDFS-6945.2.patch against trunk revision 258c7d0. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7864//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7864//console This message is automatically generated.
        Hide
        Akira AJISAKA added a comment -

        The test failures look unrelated to the patch. HDFS-6980 and HDFS-6694 track these.

        Show
        Akira AJISAKA added a comment - The test failures look unrelated to the patch. HDFS-6980 and HDFS-6694 track these.
        Hide
        Akira AJISAKA added a comment -

        I think the patch is ready for review.

        Show
        Akira AJISAKA added a comment - I think the patch is ready for review.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12665699/HDFS-6945.2.patch
        against trunk revision 1556f86.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8877//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8877//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665699/HDFS-6945.2.patch against trunk revision 1556f86. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8877//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8877//console This message is automatically generated.
        Hide
        Akira AJISAKA added a comment -

        Can anyone review this patch?
        This issue does cause memory leak, so I want to fix it as soon as possible.

        Show
        Akira AJISAKA added a comment - Can anyone review this patch? This issue does cause memory leak, so I want to fix it as soon as possible.
        Hide
        Akira AJISAKA added a comment -

        v3 patch removed unnecessarily matching by keys in TreeMap.

        Show
        Akira AJISAKA added a comment - v3 patch removed unnecessarily matching by keys in TreeMap.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12702070/HDFS-6945-003.patch
        against trunk revision b18d383.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.server.balancer.TestBalancer
        org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9705//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9705//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702070/HDFS-6945-003.patch against trunk revision b18d383. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancer org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9705//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9705//console This message is automatically generated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        removeFromExcessReplicateMap is quite expensive. It iterates all the storages in excessReplicateMap to find the given block.

        How about get the Block info from the blocksMap first? Then the storage information can be used to remove the blocks in excessReplicateMap.

        Show
        Tsz Wo Nicholas Sze added a comment - removeFromExcessReplicateMap is quite expensive. It iterates all the storages in excessReplicateMap to find the given block. How about get the Block info from the blocksMap first? Then the storage information can be used to remove the blocks in excessReplicateMap.
        Hide
        Akira AJISAKA added a comment -

        Thanks Tsz Wo Nicholas Sze for your review. Updated the patch based on your comment.
        I had missed your comment, sorry for late response.

        Show
        Akira AJISAKA added a comment - Thanks Tsz Wo Nicholas Sze for your review. Updated the patch based on your comment. I had missed your comment, sorry for late response.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12707703/HDFS-6945-004.patch
        against trunk revision 3d9132d.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10103//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10103//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707703/HDFS-6945-004.patch against trunk revision 3d9132d. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10103//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10103//console This message is automatically generated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        The new patch looks good. However, it cannot be applied cleanly. We need to update the patch. For the new patch, could you also change removeBlock(..) to call removeBlockFromMap(..) instead of calling the individual methods?

        Show
        Tsz Wo Nicholas Sze added a comment - The new patch looks good. However, it cannot be applied cleanly. We need to update the patch. For the new patch, could you also change removeBlock(..) to call removeBlockFromMap(..) instead of calling the individual methods?
        Hide
        Akira AJISAKA added a comment -

        Thanks Tsz Wo Nicholas Sze for the comment. Cleaned up the patch.

        Show
        Akira AJISAKA added a comment - Thanks Tsz Wo Nicholas Sze for the comment. Cleaned up the patch.

          People

          • Assignee:
            Akira AJISAKA
            Reporter:
            Akira AJISAKA
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development