Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11881

NameNode consumes a lot of memory for snapshot diff report generation

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
    • Component/s: hdfs, snapshots
    • Labels:
      None

      Description

      Problem:
      HDFS supports a snapshot diff tool which can generate a detailed report of modified, created, deleted and renamed files between any 2 snapshots.

      hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
      

      However, if the diff list between 2 snapshots happens to be huge, in the order of millions, then NameNode can consume a lot of memory while generating the huge diff report. In a few cases, we are seeing NameNode getting into a long GC lasting for few minutes to make room for this burst in memory requirement during snapshot diff report generation.

      RootCause:

      • NameNode tries to generate the diff report with all diff entries at once which puts undue pressure
      • Each diff report entry has the diff type (enum), source path byte array, and destination path byte array to the minimum. Let's take file deletions use case. For file deletions, there would be only source or destination paths in the diff report entry. Let's assume these deleted files on average take 128Bytes for the path. 4 million file deletion captured in diff report will thus need 512MB of memory
      • The snapshot diff report uses simple java ArrayList which tries to double its backing contiguous memory chunk every time the usage factor crosses the capacity threshold. So, a 512MB memory requirement might be internally asking for a much larger contiguous memory chunk

      Proposal:

      • Make NameNode snapshot diff report service follow the batch model (like directory listing service). Clients (hdfs snapshotDiff command) will then receive diff report in small batches, and need to iterate several times to get the full list.
      • Additionally, snap diff report service in the NameNode can make use of ChunkedArrayList data structure instead of the current ArrayList so as to avoid the curse of fragmentation and large contiguous memory requirement.
      1. 1_ChunkedArrayList_SnapshotDiffReport.png
        148 kB
        Manoj Govindassamy
      2. 2_ArrayList_SnapshotDiffReport.png
        146 kB
        Manoj Govindassamy
      3. HDFS-11881.01.patch
        7 kB
        Manoj Govindassamy

        Issue Links

          Activity

          Hide
          manojg Manoj Govindassamy added a comment -

          Attaching patch v01 to address the following. Wei-Chiu Chuang / Yongjun Zhang, can you please take a look at the patch ?

          1. Changed SnapshotDiffInfo#generateReport to use ChunkedArrayList instad of ArrayList. It iterates over the diffMap entries, constructs diffReportEntry and adds it to the chunked array list.
          2. PBHelperClient#convert() updated for both client and server side to make use of ChunkedArrayList instead of ArrayList
          3. Updated TestSnapshotCommands to verify snapshotDiff shell command works as expected with the chunked array list.

          Show
          manojg Manoj Govindassamy added a comment - Attaching patch v01 to address the following. Wei-Chiu Chuang / Yongjun Zhang , can you please take a look at the patch ? 1. Changed SnapshotDiffInfo#generateReport to use ChunkedArrayList instad of ArrayList . It iterates over the diffMap entries, constructs diffReportEntry and adds it to the chunked array list. 2. PBHelperClient#convert() updated for both client and server side to make use of ChunkedArrayList instead of ArrayList 3. Updated TestSnapshotCommands to verify snapshotDiff shell command works as expected with the chunked array list.
          Hide
          manojg Manoj Govindassamy added a comment -

          Let's use this jira to fix the high memory usage issue via ChunkedArrayList method as in the proposal #2. Will track proposal #1 in a new jira.

          Show
          manojg Manoj Govindassamy added a comment - Let's use this jira to fix the high memory usage issue via ChunkedArrayList method as in the proposal #2. Will track proposal #1 in a new jira.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 28s Maven dependency ordering for branch
          +1 mvninstall 13m 58s trunk passed
          +1 compile 1m 26s trunk passed
          +1 checkstyle 0m 41s trunk passed
          +1 mvnsite 1m 28s trunk passed
          +1 findbugs 3m 4s trunk passed
          +1 javadoc 1m 2s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 20s the patch passed
          +1 compile 1m 24s the patch passed
          +1 javac 1m 24s the patch passed
          -0 checkstyle 0m 38s hadoop-hdfs-project: The patch generated 4 new + 79 unchanged - 0 fixed = 83 total (was 79)
          +1 mvnsite 1m 23s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 3m 13s the patch passed
          +1 javadoc 0m 56s the patch passed
          +1 unit 1m 11s hadoop-hdfs-client in the patch passed.
          -1 unit 99m 56s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 25s The patch does not generate ASF License warnings.
          134m 30s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDirectoryScanner
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
          Timed out junit tests org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HDFS-11881
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872786/HDFS-11881.01.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 26f4e5d8ef81 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / b3d3ede
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19887/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19887/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19887/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19887/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 28s Maven dependency ordering for branch +1 mvninstall 13m 58s trunk passed +1 compile 1m 26s trunk passed +1 checkstyle 0m 41s trunk passed +1 mvnsite 1m 28s trunk passed +1 findbugs 3m 4s trunk passed +1 javadoc 1m 2s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 20s the patch passed +1 compile 1m 24s the patch passed +1 javac 1m 24s the patch passed -0 checkstyle 0m 38s hadoop-hdfs-project: The patch generated 4 new + 79 unchanged - 0 fixed = 83 total (was 79) +1 mvnsite 1m 23s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 13s the patch passed +1 javadoc 0m 56s the patch passed +1 unit 1m 11s hadoop-hdfs-client in the patch passed. -1 unit 99m 56s hadoop-hdfs in the patch failed. +1 asflicense 0m 25s The patch does not generate ASF License warnings. 134m 30s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDirectoryScanner   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting Timed out junit tests org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11881 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872786/HDFS-11881.01.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 26f4e5d8ef81 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b3d3ede Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19887/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19887/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19887/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19887/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Hi Manoj, thanks for working on this patch. Good finding and looks like a good improvement.
          One quick question: it seems the scope of the patch is relatively limited to your #2 (ChunkedArrayList in place of ArrayList). Do you plan to address your #1 (report snapshot diff in batches)?

          Show
          jojochuang Wei-Chiu Chuang added a comment - Hi Manoj, thanks for working on this patch. Good finding and looks like a good improvement. One quick question: it seems the scope of the patch is relatively limited to your #2 (ChunkedArrayList in place of ArrayList). Do you plan to address your #1 (report snapshot diff in batches)?
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Ah never mind... I saw your comment.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Ah never mind... I saw your comment.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          I am not sure if it's necessary to add new tests. It looks to me that existing tests in TestSnapshotDiffReport cover most code path. Maybe it's easier to update one of the test there to create 100 more files.

          Also, have you done any kind of heap size measurement against a real cluster of substantial size before/after this patch?
          I am not seeing the profile myself so I am hesitate to say changing the data structures in these few places is sufficient to improve heap usage.

          Thanks.

          Show
          jojochuang Wei-Chiu Chuang added a comment - I am not sure if it's necessary to add new tests. It looks to me that existing tests in TestSnapshotDiffReport cover most code path. Maybe it's easier to update one of the test there to create 100 more files. Also, have you done any kind of heap size measurement against a real cluster of substantial size before/after this patch? I am not seeing the profile myself so I am hesitate to say changing the data structures in these few places is sufficient to improve heap usage. Thanks.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          So for example, Diff#insert uses ArrayList to store created and deleted inodes. Considering that a directory might have millions of created/deleted inodes in a snapshot, there is a potential upside to convert these lists to ChunkedArrayList. This is just a suggestion and I want to be cautious here: any sort of heap usage optimization should be accompanied with a real cluster benchmark.

          Show
          jojochuang Wei-Chiu Chuang added a comment - So for example, Diff#insert uses ArrayList to store created and deleted inodes. Considering that a directory might have millions of created/deleted inodes in a snapshot, there is a potential upside to convert these lists to ChunkedArrayList. This is just a suggestion and I want to be cautious here: any sort of heap usage optimization should be accompanied with a real cluster benchmark.
          Hide
          manojg Manoj Govindassamy added a comment -

          Thanks for the review Wei-Chiu Chuang.

          It looks to me that existing tests in TestSnapshotDiffReport cover most code path. Maybe it's easier to update one of the test there to create 100 more files.

          TestSnapshotDiffReport is testing the diff report using HDFS API. But in the context of the GC problem we are trying to solve here, we want the diff command to be run over the shell. TestSnapshotCommands is already testing snapshot commands over shell and looks like a better place for the new diff command test.

          have you done any kind of heap size measurement against a real cluster of substantial size before/after this patch?

          I was planning to do memory profiling for the broader fix. This particular jira is more of short term, quick fix approach using the already existing and proven methods. By looking at the code and comments, ChunkedArrayList is far better than ArrayList in terms of memory usage. This quick fix might not sufficient to solve all the GC problems around snapshot diff report, but definitely helps to mitigate the problem.

          Diff#insert uses ArrayList to store created and deleted inodes. Considering that a directory might have millions of created/deleted inodes in a snapshot, there is a potential upside to convert these lists to ChunkedArrayList.

          Thats right, DirectoryDiff list is still ArrayList and suffers from the contiguous memory allocation issue. Will convert these to ChunkedArrayList.

          Show
          manojg Manoj Govindassamy added a comment - Thanks for the review Wei-Chiu Chuang . It looks to me that existing tests in TestSnapshotDiffReport cover most code path. Maybe it's easier to update one of the test there to create 100 more files. TestSnapshotDiffReport is testing the diff report using HDFS API. But in the context of the GC problem we are trying to solve here, we want the diff command to be run over the shell. TestSnapshotCommands is already testing snapshot commands over shell and looks like a better place for the new diff command test. have you done any kind of heap size measurement against a real cluster of substantial size before/after this patch? I was planning to do memory profiling for the broader fix. This particular jira is more of short term, quick fix approach using the already existing and proven methods. By looking at the code and comments, ChunkedArrayList is far better than ArrayList in terms of memory usage. This quick fix might not sufficient to solve all the GC problems around snapshot diff report, but definitely helps to mitigate the problem. Diff#insert uses ArrayList to store created and deleted inodes. Considering that a directory might have millions of created/deleted inodes in a snapshot, there is a potential upside to convert these lists to ChunkedArrayList. Thats right, DirectoryDiff list is still ArrayList and suffers from the contiguous memory allocation issue. Will convert these to ChunkedArrayList.
          Hide
          manojg Manoj Govindassamy added a comment -

          Wei-Chiu Chuang / Yongjun Zhang,
          Wrote a test to have 500K files in the snaphot diff report and run the snapshot diff shell command for 100+ times to see how the heap gets fragmented and the FullGC frequencies. Attached heap graph for both ArrayList and ChunkedArrayList based implementations of SnapshotDiffReport. The ArrayList needs quite a frequent LongGC to clear up the heap and to make room for the new report. Whereas, ChunkedArrayList based SansphotDiffReport needed only less number of FullGCs for the same test. If we can scale this test to have 10G+ SnapshotDiffReport, then the differences in heap usages and FullGCs requirement for ArrayList based approach will be of order of magnitude higher compared to ChunkedArrayList.

          Tried to do similar ChunkedArrayList approach for DirDiff, but soon realized that DirDiff uses far more functionality in the diff list like add by index, remove by index, set by index etc. All these index based operations are currently not supported in ChunkedArrayList. So, will take up this bugger task in a separate jira.

          Can you please review the patch v01 in the context of FileDiff improvements alone for SnapshotDiffReport usecase?

          Show
          manojg Manoj Govindassamy added a comment - Wei-Chiu Chuang / Yongjun Zhang , Wrote a test to have 500K files in the snaphot diff report and run the snapshot diff shell command for 100+ times to see how the heap gets fragmented and the FullGC frequencies. Attached heap graph for both ArrayList and ChunkedArrayList based implementations of SnapshotDiffReport. The ArrayList needs quite a frequent LongGC to clear up the heap and to make room for the new report. Whereas, ChunkedArrayList based SansphotDiffReport needed only less number of FullGCs for the same test. If we can scale this test to have 10G+ SnapshotDiffReport, then the differences in heap usages and FullGCs requirement for ArrayList based approach will be of order of magnitude higher compared to ChunkedArrayList. Tried to do similar ChunkedArrayList approach for DirDiff, but soon realized that DirDiff uses far more functionality in the diff list like add by index, remove by index, set by index etc. All these index based operations are currently not supported in ChunkedArrayList. So, will take up this bugger task in a separate jira. Can you please review the patch v01 in the context of FileDiff improvements alone for SnapshotDiffReport usecase?
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          LGTM.

          Thanks a lot. The chart shows heap usage grows more rapidly with ArrayList, therefore causing more FGCs.
          I plan to commit this patch by end of tomorrow if no objections.

          Show
          jojochuang Wei-Chiu Chuang added a comment - LGTM. Thanks a lot. The chart shows heap usage grows more rapidly with ArrayList, therefore causing more FGCs. I plan to commit this patch by end of tomorrow if no objections.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Committed the patch to trunk, branch-2.9 and branch-2.8. Thanks Manoj Govindassamy for the contribution!

          Show
          jojochuang Wei-Chiu Chuang added a comment - Committed the patch to trunk, branch-2.9 and branch-2.8. Thanks Manoj Govindassamy for the contribution!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11950 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11950/)
          HDFS-11881. NameNode consumes a lot of memory for snapshot diff report (weichiu: rev 16c8dbde574f49827fde5ee9add1861ee65d4645)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotDiffInfo.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestSnapshotCommands.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11950 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11950/ ) HDFS-11881 . NameNode consumes a lot of memory for snapshot diff report (weichiu: rev 16c8dbde574f49827fde5ee9add1861ee65d4645) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotDiffInfo.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestSnapshotCommands.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
          Hide
          manojg Manoj Govindassamy added a comment -

          Thanks for the review and commit help Wei-Chiu Chuang.

          Show
          manojg Manoj Govindassamy added a comment - Thanks for the review and commit help Wei-Chiu Chuang .

            People

            • Assignee:
              manojg Manoj Govindassamy
              Reporter:
              manojg Manoj Govindassamy
            • Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development