Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11661

GetContentSummary uses excessive amounts of memory

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.8.0, 3.0.0-alpha2
    • Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Reverted HDFS-10797 to fix a scalability regression brought by the commit.
      Show
      Reverted HDFS-10797 to fix a scalability regression brought by the commit.

      Description

      ContentSummaryComputationContext::nodeIncluded() is being used to keep track of all INodes visited during the current content summary calculation. This can be all of the INodes in the filesystem, making for a VERY large hash table. This simply won't work on large filesystems.

      We noticed this after upgrading a namenode with ~100Million filesystem objects was spending significantly more time in GC. Fortunately this system had some memory breathing room, other clusters we have will not run with this additional demand on memory.

      This was added as part of HDFS-10797 as a way of keeping track of INodes that have already been accounted for - to avoid double counting.

      1. HDFS-11661.001.patch
        10 kB
        Wei-Chiu Chuang
      2. HDFs-11661.002.patch
        10 kB
        Wei-Chiu Chuang
      3. Heap growth.png
        184 kB
        Daryn Sharp

        Issue Links

          Activity

          Hide
          daryn Daryn Sharp added a comment -

          Words don't describe how just how horribly unacceptable the heap growth has become. Here's 1 year of heap growth on a teeny tiny 350 node cluster processing only 1k ops/sec. The previous aberration was due to a bad 2.7 release.

          Show
          daryn Daryn Sharp added a comment - Words don't describe how just how horribly unacceptable the heap growth has become. Here's 1 year of heap growth on a teeny tiny 350 node cluster processing only 1k ops/sec. The previous aberration was due to a bad 2.7 release.
          Hide
          kihwal Kihwal Lee added a comment -

          Xiao Chen, Jing Zhao, any thoughts? We've internally reverted HDFS-10797 for now as it is unusable for anything beyond light-duty test clusters.

          Show
          kihwal Kihwal Lee added a comment - Xiao Chen , Jing Zhao , any thoughts? We've internally reverted HDFS-10797 for now as it is unusable for anything beyond light-duty test clusters.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          I am +1 for reverting HDFS-10797. In fact, that's what I initially wanted to do when filing HDFS-11515 (also broken by HDFS-10797).

          Show
          jojochuang Wei-Chiu Chuang added a comment - I am +1 for reverting HDFS-10797 . In fact, that's what I initially wanted to do when filing HDFS-11515 (also broken by HDFS-10797 ).
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Kihwal Lee, Daryn Sharp and Nathan Roberts:
          could you please share what your clusters run which exposed this bug? I would like to understand this bug and reproduce the issue you mentioned. We can revert HDFS-10797, but in the long run I would still like to fix the bug that HDFS-10797 intended to fix.

          Thanks very much!

          Show
          jojochuang Wei-Chiu Chuang added a comment - Kihwal Lee , Daryn Sharp and Nathan Roberts : could you please share what your clusters run which exposed this bug? I would like to understand this bug and reproduce the issue you mentioned. We can revert HDFS-10797 , but in the long run I would still like to fix the bug that HDFS-10797 intended to fix. Thanks very much!
          Hide
          nroberts Nathan Roberts added a comment - - edited

          Wei-Chiu Chuang, sure, occasional `du -s` of very large directory trees, think 100s of millions of files/directories.

          Show
          nroberts Nathan Roberts added a comment - - edited Wei-Chiu Chuang , sure, occasional `du -s` of very large directory trees, think 100s of millions of files/directories.
          Hide
          djp Junping Du added a comment -

          Thanks Nathan Roberts and all for reporting the issue. +1 on reverting HDFS-10797 if we don't have a quick fix here. Just pinged the patch authors of HDFS-10797 to see if any magic can happen in short term.

          Show
          djp Junping Du added a comment - Thanks Nathan Roberts and all for reporting the issue. +1 on reverting HDFS-10797 if we don't have a quick fix here. Just pinged the patch authors of HDFS-10797 to see if any magic can happen in short term.
          Hide
          mackrorysd Sean Mackrory added a comment -

          +1 to the revert - I too would still like to see the original problem fixed, but this is worse. It does indeed require global context to do correctly, so it'll require some cleverness to make sure we do that without using tons of space or locking for a long time.

          Wei-Chiu Chuang - to revert cleanly we can revert HDFS-11515 (unless I'm missing something and that patch does more than just correct the original changes in HDFS-10797) first and then HDFS-10797. As Xiao Chen is not available right now, would you be able to commit the revert when we're satisfied? I'll run tests with the reverts committed locally...

          Show
          mackrorysd Sean Mackrory added a comment - +1 to the revert - I too would still like to see the original problem fixed, but this is worse. It does indeed require global context to do correctly, so it'll require some cleverness to make sure we do that without using tons of space or locking for a long time. Wei-Chiu Chuang - to revert cleanly we can revert HDFS-11515 (unless I'm missing something and that patch does more than just correct the original changes in HDFS-10797 ) first and then HDFS-10797 . As Xiao Chen is not available right now, would you be able to commit the revert when we're satisfied? I'll run tests with the reverts committed locally...
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Hey Sean, thanks for the comment.
          I think there's a lightweight approach to fix the same bug without includedNodes set. I will post a patch next week if I can do something clever.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Hey Sean, thanks for the comment. I think there's a lightweight approach to fix the same bug without includedNodes set. I will post a patch next week if I can do something clever.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Here's my proof of concept fix to eliminate includedNodes while fixing renamed snapshotted files getting counted twice.

          It uses FSDirectory.inodeMap to determine whether a deleted snapshotted inode is actually deleted or renamed, and whether the renamed inode remains under the same du subtree.

          The v001 patch assumes the du root is a directory. I have not yet considered the cases when files are truncated or appended, and I have not considered the case if there are symlinks in the directory.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Here's my proof of concept fix to eliminate includedNodes while fixing renamed snapshotted files getting counted twice. It uses FSDirectory.inodeMap to determine whether a deleted snapshotted inode is actually deleted or renamed, and whether the renamed inode remains under the same du subtree. The v001 patch assumes the du root is a directory. I have not yet considered the cases when files are truncated or appended, and I have not considered the case if there are symlinks in the directory.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Here's a rev 002 patch. Added a few asserts and checks to make sure it doesn't break.

          The patch passed all tests on CDH5.11 code, so I think it's ready for precommit check on trunk.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Here's a rev 002 patch. Added a few asserts and checks to make sure it doesn't break. The patch passed all tests on CDH5.11 code, so I think it's ready for precommit check on trunk.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 15m 24s trunk passed
          +1 compile 0m 55s trunk passed
          +1 checkstyle 0m 41s trunk passed
          +1 mvnsite 1m 2s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          -1 findbugs 1m 54s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
          +1 javadoc 0m 44s trunk passed
          +1 mvninstall 0m 52s the patch passed
          +1 compile 0m 58s the patch passed
          +1 javac 0m 58s the patch passed
          -0 checkstyle 0m 39s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 120 unchanged - 0 fixed = 121 total (was 120)
          +1 mvnsite 0m 56s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 58s the patch passed
          +1 javadoc 0m 40s the patch passed
          -1 unit 67m 7s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          96m 20s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.balancer.TestBalancerRPCDelay
            hadoop.hdfs.server.namenode.TestStartup
            hadoop.hdfs.server.namenode.TestMetadataVersionOutput
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ac17dc
          JIRA Issue HDFS-11661
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865620/HDFs-11661.002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux f1e485d1ad22 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / fdf5192
          Default Java 1.8.0_121
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19236/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19236/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19236/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19236/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19236/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 15m 24s trunk passed +1 compile 0m 55s trunk passed +1 checkstyle 0m 41s trunk passed +1 mvnsite 1m 2s trunk passed +1 mvneclipse 0m 16s trunk passed -1 findbugs 1m 54s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 44s trunk passed +1 mvninstall 0m 52s the patch passed +1 compile 0m 58s the patch passed +1 javac 0m 58s the patch passed -0 checkstyle 0m 39s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 120 unchanged - 0 fixed = 121 total (was 120) +1 mvnsite 0m 56s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 58s the patch passed +1 javadoc 0m 40s the patch passed -1 unit 67m 7s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 96m 20s Reason Tests Failed junit tests hadoop.hdfs.server.balancer.TestBalancerRPCDelay   hadoop.hdfs.server.namenode.TestStartup   hadoop.hdfs.server.namenode.TestMetadataVersionOutput   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:0ac17dc JIRA Issue HDFS-11661 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865620/HDFs-11661.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux f1e485d1ad22 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / fdf5192 Default Java 1.8.0_121 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19236/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19236/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19236/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19236/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19236/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Wei-Chiu Chuang,

          Thanks for working on this, it seems a very good fix to solve the issue reported here. I did a review and below are my comments.

          1. There are the following scenarios:
          a. mv fileX fileY (fileY is a child of the same snapshot root)
          b. mv fileX fileY (fileY is not a child of the same snapshot root)
          c. mv fileX fileY, take snapshot, mv fileY fileX
          d. delete fileX
          where b/d can be handled same way, and a/c can be handled same way. HDFS-10797 already has a unit test, I'm not sure whether it covers all scenario, if not, we can extend the test to cover these scenarios. So to trace the behavior of the changed code to see if that the code works as expected for different scenarios.

          2. The following code covers scenario c, it seems the calculation computeContentSummary done in this if branch is already done before this code is reached, and we should not call it again here. Let's prove that with the unit test.

              if (nodePath.equals(inodeInMapPath)) {
                  LOG.info("snapshot inode " + nodePath + " was deleted or " +
                          "was renamed+renamed back.");
                  inodeInMap.computeContentSummary(Snapshot.CURRENT_STATE_ID, this);
                } else {
          

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - HI Wei-Chiu Chuang , Thanks for working on this, it seems a very good fix to solve the issue reported here. I did a review and below are my comments. 1. There are the following scenarios: a. mv fileX fileY (fileY is a child of the same snapshot root) b. mv fileX fileY (fileY is not a child of the same snapshot root) c. mv fileX fileY, take snapshot, mv fileY fileX d. delete fileX where b/d can be handled same way, and a/c can be handled same way. HDFS-10797 already has a unit test, I'm not sure whether it covers all scenario, if not, we can extend the test to cover these scenarios. So to trace the behavior of the changed code to see if that the code works as expected for different scenarios. 2. The following code covers scenario c, it seems the calculation computeContentSummary done in this if branch is already done before this code is reached, and we should not call it again here. Let's prove that with the unit test. if (nodePath.equals(inodeInMapPath)) { LOG.info( "snapshot inode " + nodePath + " was deleted or " + "was renamed+renamed back." ); inodeInMap.computeContentSummary(Snapshot.CURRENT_STATE_ID, this ); } else { Thanks.
          Hide
          daryn Daryn Sharp added a comment -

          Please hold off on commit until later this week. There are more bugs related to snapshots and content summary and quota usage discrepencies. I almost have a patch ready that optimizes content summary and appears to fix the snapshot issues.

          Show
          daryn Daryn Sharp added a comment - Please hold off on commit until later this week. There are more bugs related to snapshots and content summary and quota usage discrepencies. I almost have a patch ready that optimizes content summary and appears to fix the snapshot issues.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Daryn Sharp I am testing my patch against a production cluster fsimage, and found some issues.

          But feel free to reassign this to you if you are working on a more complete fix.

          Thanks.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Daryn Sharp I am testing my patch against a production cluster fsimage, and found some issues. But feel free to reassign this to you if you are working on a more complete fix. Thanks.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          FYI the issue I found is related to HDFS-11515. That fix was not complete, so you might as well take that into account.

          Show
          jojochuang Wei-Chiu Chuang added a comment - FYI the issue I found is related to HDFS-11515 . That fix was not complete, so you might as well take that into account.
          Hide
          andrew.wang Andrew Wang added a comment -

          Daryn Sharp do you think we can get this turned around by EOW? I'm aiming to roll alpha3 on Friday. We can retarget for beta1 if that's too aggressive.

          Show
          andrew.wang Andrew Wang added a comment - Daryn Sharp do you think we can get this turned around by EOW? I'm aiming to roll alpha3 on Friday. We can retarget for beta1 if that's too aggressive.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          I don't want to have this one blocking Hadoop 3 alpha3. I suggest we revert both HDFS-11515 and HDFS-11661 to unblock.

          Show
          jojochuang Wei-Chiu Chuang added a comment - I don't want to have this one blocking Hadoop 3 alpha3. I suggest we revert both HDFS-11515 and HDFS-11661 to unblock.
          Hide
          shahrs87 Rushabh S Shah added a comment -

          I suggest we revert both HDFS-11515 and HDFS-11661 to unblock.

          Wei-Chiu Chuang: By HDFS-11661, you mean HDFS-10797 correct ?

          Show
          shahrs87 Rushabh S Shah added a comment - I suggest we revert both HDFS-11515 and HDFS-11661 to unblock. Wei-Chiu Chuang : By HDFS-11661 , you mean HDFS-10797 correct ?
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Rushabh S Shah
          My bad. You're absolutely correct.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Rushabh S Shah My bad. You're absolutely correct.
          Hide
          daryn Daryn Sharp added a comment -

          Sorry, thought I commented last week. I agree we should just revert the original change entirely. I may/should/hopefully have a patch by EOW, but don't let it block the release. We've "lived" with the inconsistency this long so waiting a bit longer won't hurt. Between debugging 2.8 and grokking snapshots, fixing the discrepancies is taking longer than expected.

          Show
          daryn Daryn Sharp added a comment - Sorry, thought I commented last week. I agree we should just revert the original change entirely. I may/should/hopefully have a patch by EOW, but don't let it block the release. We've "lived" with the inconsistency this long so waiting a bit longer won't hurt. Between debugging 2.8 and grokking snapshots, fixing the discrepancies is taking longer than expected.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Heads up I am going to revert both commits EOD. I believe I've got sufficient number of upvotes.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Heads up I am going to revert both commits EOD. I believe I've got sufficient number of upvotes.
          Hide
          xiaochen Xiao Chen added a comment -

          Just returning from paternity leave, sorry for the inconvenience and thanks Wei-Chiu for working on the revert. Please let me know if you need help on reverting.
          Also love to look for a fix and the Daryn patch, but +1 to unblock the release first.

          Show
          xiaochen Xiao Chen added a comment - Just returning from paternity leave, sorry for the inconvenience and thanks Wei-Chiu for working on the revert. Please let me know if you need help on reverting. Also love to look for a fix and the Daryn patch, but +1 to unblock the release first.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Based on multiple +1, I reverted the commit from branch-2.8, branch-2 and trunk.

          Thanks to Nathan Roberts for reporting the issue, and comments from Kihwal Lee, Sean Mackrory, Xiao Chen Junping Du Andrew Wang Rushabh S Shah Yongjun Zhang and Daryn Sharp.

          Daryn Sharp thanks for your effort trying to fix the bug. Please file a new jira for your patch. Thanks!

          Show
          jojochuang Wei-Chiu Chuang added a comment - Based on multiple +1, I reverted the commit from branch-2.8, branch-2 and trunk. Thanks to Nathan Roberts for reporting the issue, and comments from Kihwal Lee , Sean Mackrory , Xiao Chen Junping Du Andrew Wang Rushabh S Shah Yongjun Zhang and Daryn Sharp . Daryn Sharp thanks for your effort trying to fix the bug. Please file a new jira for your patch. Thanks!
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - 2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.

            People

            • Assignee:
              jojochuang Wei-Chiu Chuang
              Reporter:
              nroberts Nathan Roberts
            • Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development