Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4461

DirectoryScanner: volume path prefix takes up memory for every block that is scanned

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.0.3-alpha
    • Fix Version/s: 2.1.0-beta, 0.23.11
    • Component/s: None
    • Labels:
      None

      Description

      In the DirectoryScanner, we create a class ScanInfo for every block. This object contains two File objects-- one for the metadata file, and one for the block file. Since those File objects contain full paths, users who pick a lengthly path for their volume roots will end up using an extra N_blocks * path_prefix bytes per block scanned. We also don't really need to store File objects-- storing strings and then creating File objects as needed would be cheaper. This would be a nice efficiency improvement.

      1. HDFS-4461.branch-0.23.patch
        11 kB
        Kihwal Lee
      2. HDFS-4661.006.patch
        10 kB
        Colin Patrick McCabe
      3. HDFS-4461.004.patch
        6 kB
        Colin Patrick McCabe
      4. HDFS-4461.003.patch
        4 kB
        Colin Patrick McCabe
      5. HDFS-4461.002.patch
        4 kB
        Colin Patrick McCabe
      6. memory-analysis.png
        140 kB
        Colin Patrick McCabe

        Issue Links

          Activity

          Hide
          Colin Patrick McCabe added a comment -

          here's a before vs. after picture of a memory analysis. you can see that in the "after" picture, we are no longer storing the path prefix twice per block in the ScanInfo class.

          Show
          Colin Patrick McCabe added a comment - here's a before vs. after picture of a memory analysis. you can see that in the "after" picture, we are no longer storing the path prefix twice per block in the ScanInfo class.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567472/memory-analysis.png
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3927//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567472/memory-analysis.png against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3927//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          it tried to apply the png as a patch... re-uploading the patch

          Show
          Colin Patrick McCabe added a comment - it tried to apply the png as a patch... re-uploading the patch
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567469/002.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3926//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3926//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3926//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567469/002.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3926//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3926//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3926//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          fix findbugs issue

          Show
          Colin Patrick McCabe added a comment - fix findbugs issue
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567474/HDFS-4461.002.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3928//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3928//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3928//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567474/HDFS-4461.002.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3928//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3928//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3928//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567501/HDFS-4461.003.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3931//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3931//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567501/HDFS-4461.003.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3931//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3931//console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          This has been causing out-of-memory conditions for users who pick such long volume paths.

          I doubt that the directory scanner is the cause of OOM error. It is probably happening due to some other issue. How many blocks per storage directory do you have, when OOME happened?

          here's a before vs. after picture of a memory analysis. you can see that in the "after" picture, we are no longer storing the path prefix twice per block in the ScanInfo class

          I have hard time understanding the picture. How many bytes are we saving per ScanInfo?

          Show
          Suresh Srinivas added a comment - This has been causing out-of-memory conditions for users who pick such long volume paths. I doubt that the directory scanner is the cause of OOM error. It is probably happening due to some other issue. How many blocks per storage directory do you have, when OOME happened? here's a before vs. after picture of a memory analysis. you can see that in the "after" picture, we are no longer storing the path prefix twice per block in the ScanInfo class I have hard time understanding the picture. How many bytes are we saving per ScanInfo?
          Hide
          Colin Patrick McCabe added a comment -

          I doubt that the directory scanner is the cause of OOM error. It is probably happening due to some other issue. How many blocks per storage directory do you have, when OOME happened?

          we analyzed a DN heap dump from a production cluster with eclipse memory analyzer and found that the memory was full of ScanInfo objects. The memory histogram showed that java.lang.String was the third-largest consumer of memory in the system. Unfortunately I can't share the heap dump.

          I have hard time understanding the picture. How many bytes are we saving per ScanInfo?

          In the particular case shown in memory-analysis.png, we save 86 characters in each string. The volume prefix that we avoid storing is /home/cmccabe/hadoop4/hadoop-hdfs-project/hadoop-hdfs/build//test/data/dfs/data/data1/. Java uses 2 bytes per character (UCS-2 encoding), and we store both metaPath and blockPath, so multiply that by 4 to get 344. Then add the overhead of using two objects File that contain the path string instead of just the string itself-- probably around an extra 16 bytes per object, for 376 bytes in total saved per ScanInfo.

          You might think that /home/cmccabe/hadoop4/hadoop-hdfs-project/hadoop-hdfs/build//test/data/dfs/data/data1/ is an unrealistically long volume path, but here is an example of a real volume path in use on a production cluster:

          /mnt/hdfs/hdfs01/10769eef-a23a-4300-b45b-749221786109/dfs/dn.

          Putting the disk UUID into the volume is an obvious thing to do if you're a system administrator.

          Show
          Colin Patrick McCabe added a comment - I doubt that the directory scanner is the cause of OOM error. It is probably happening due to some other issue. How many blocks per storage directory do you have, when OOME happened? we analyzed a DN heap dump from a production cluster with eclipse memory analyzer and found that the memory was full of ScanInfo objects. The memory histogram showed that java.lang.String was the third-largest consumer of memory in the system. Unfortunately I can't share the heap dump. I have hard time understanding the picture. How many bytes are we saving per ScanInfo? In the particular case shown in memory-analysis.png, we save 86 characters in each string. The volume prefix that we avoid storing is /home/cmccabe/hadoop4/hadoop-hdfs-project/hadoop-hdfs/build//test/data/dfs/data/data1/ . Java uses 2 bytes per character (UCS-2 encoding), and we store both metaPath and blockPath, so multiply that by 4 to get 344. Then add the overhead of using two objects File that contain the path string instead of just the string itself-- probably around an extra 16 bytes per object, for 376 bytes in total saved per ScanInfo . You might think that /home/cmccabe/hadoop4/hadoop-hdfs-project/hadoop-hdfs/build//test/data/dfs/data/data1/ is an unrealistically long volume path, but here is an example of a real volume path in use on a production cluster: /mnt/hdfs/hdfs01/10769eef-a23a-4300-b45b-749221786109/dfs/dn . Putting the disk UUID into the volume is an obvious thing to do if you're a system administrator.
          Hide
          Suresh Srinivas added a comment -

          we analyzed a DN heap dump from a production cluster with eclipse memory analyzer and found that the memory was full of ScanInfo objects. The memory histogram showed that java.lang.String was the third-largest consumer of memory in the system. Unfortunately I can't share the heap dump.

          A server generally has a lot of String objects. There are also file objects in ReplicasMap, string paths tracked in many other places as well.

          This patch indeed saves few bytes. However I do not think this is either the cause of the OOME or is likely to solve that issue. ScanInfo is a short lived object, unlike other data structures that are long lived.

          Can you answer the following question, I previously asked:

          How many blocks per storage directory do you have, when OOME happened?

          Or at least the number of ScanInfo objects you saw.

          Show
          Suresh Srinivas added a comment - we analyzed a DN heap dump from a production cluster with eclipse memory analyzer and found that the memory was full of ScanInfo objects. The memory histogram showed that java.lang.String was the third-largest consumer of memory in the system. Unfortunately I can't share the heap dump. A server generally has a lot of String objects. There are also file objects in ReplicasMap, string paths tracked in many other places as well. This patch indeed saves few bytes. However I do not think this is either the cause of the OOME or is likely to solve that issue. ScanInfo is a short lived object, unlike other data structures that are long lived. Can you answer the following question, I previously asked: How many blocks per storage directory do you have, when OOME happened? Or at least the number of ScanInfo objects you saw.
          Hide
          Colin Patrick McCabe added a comment -

          If someone is running with around 200,000 blocks (a reasonable number), and a 50 to 80 character path, this change saves between 50 and 100 MB of heap space during the DirectoryScanner run. That's what we should be focusing on here-- the efficiency improvement. After all, that is why I marked this JIRA as "improvement" rather than "bug"

          Or at least the number of ScanInfo objects you saw.

          I saw more than 1 million ScanInfo objects. This means that either the number of blocks on the DN is much higher than we recommend, or there is another leak in the DirectoryScanner. I am trying to get confirmation that the number of blocks is really that high. If it isn't, then we will start looking more closely for memory leaks in the scanner.

          We've found that the block scanner often delivers the finishing blow to DNs that are already overloaded. This makes sense-- if your heap is already near max size, asking you to allocate a few hundred megabytes might finish you off.

          Show
          Colin Patrick McCabe added a comment - If someone is running with around 200,000 blocks (a reasonable number), and a 50 to 80 character path, this change saves between 50 and 100 MB of heap space during the DirectoryScanner run. That's what we should be focusing on here-- the efficiency improvement. After all, that is why I marked this JIRA as "improvement" rather than "bug" Or at least the number of ScanInfo objects you saw. I saw more than 1 million ScanInfo objects. This means that either the number of blocks on the DN is much higher than we recommend, or there is another leak in the DirectoryScanner . I am trying to get confirmation that the number of blocks is really that high. If it isn't, then we will start looking more closely for memory leaks in the scanner. We've found that the block scanner often delivers the finishing blow to DNs that are already overloaded. This makes sense-- if your heap is already near max size, asking you to allocate a few hundred megabytes might finish you off.
          Hide
          Suresh Srinivas added a comment -

          If someone is running with around 200,000 blocks (a reasonable number), and a 50 to 80 character path, this change saves between 50 and 100 MB of heap space during the DirectoryScanner run. That's what we should be focusing on here-- the efficiency improvement. After all, that is why I marked this JIRA as "improvement" rather than "bug"

          I think you are missing the point I made earlier. In the description you say:

          This has been causing out-of-memory conditions for users who pick such long volume paths.

          It is not correct to attribute the inefficiency in memory of DirectoryScanner to OOM. So please update the description to say DirectoryScanner can be made more efficient.

          I saw more than 1 million ScanInfo objects

          I am interested in seeing the number of blocks in this particular setup and if we are leaking these objects.

          I am more leaning towards incorrect datanode configuration in the setup where you saw OOM. Can you provide details on what the heap size of datanode is, the number of blocks on the datanode etc.?

          Show
          Suresh Srinivas added a comment - If someone is running with around 200,000 blocks (a reasonable number), and a 50 to 80 character path, this change saves between 50 and 100 MB of heap space during the DirectoryScanner run. That's what we should be focusing on here-- the efficiency improvement. After all, that is why I marked this JIRA as "improvement" rather than "bug" I think you are missing the point I made earlier. In the description you say: This has been causing out-of-memory conditions for users who pick such long volume paths. It is not correct to attribute the inefficiency in memory of DirectoryScanner to OOM. So please update the description to say DirectoryScanner can be made more efficient. I saw more than 1 million ScanInfo objects I am interested in seeing the number of blocks in this particular setup and if we are leaking these objects. I am more leaning towards incorrect datanode configuration in the setup where you saw OOM. Can you provide details on what the heap size of datanode is, the number of blocks on the datanode etc.?
          Hide
          Todd Lipcon added a comment -

          Looks like we can cut the memory usage in half again – storing both the metafile path and the block file path is redundant, since you can always compute the block path from the meta path by chopping off the "_<genstamp>.meta" prefix.

          Suresh – we routinely see users with millions of replicas per DN now that 48TB+ configurations have become commodity. Sure, we should also encourage users to use things like HAR to coalesce into larger blocks, but easy wins on DN memory usage are a no-brainer IMO.

          Show
          Todd Lipcon added a comment - Looks like we can cut the memory usage in half again – storing both the metafile path and the block file path is redundant, since you can always compute the block path from the meta path by chopping off the "_<genstamp>.meta" prefix. Suresh – we routinely see users with millions of replicas per DN now that 48TB+ configurations have become commodity. Sure, we should also encourage users to use things like HAR to coalesce into larger blocks, but easy wins on DN memory usage are a no-brainer IMO.
          Hide
          Andy Isaacson added a comment -

          A server generally has a lot of String objects. There are also file objects in ReplicasMap, string paths tracked in many other places as well.

          The cluster in question has about 1.5 million blocks per DN, across 12 datadirs. This hprof shows 1,858,340 BlockScanInfo objects. MAT computed the "Retained Heap" of FsDatasetImpl at 980 MB and the "Retained Heap" of the DirectoryScanner thread at 1.4 GB.

          ScanInfo is a short lived object, unlike other data structures that are long lived.

          It doesn't matter how narrow the peak is, if it exceeds the maximum permissible value. In this case we seem to have a complete set of ScanInfo objects (for the entire dataset) active on the heap, with the DirectoryScanner thread in the process of reconcile()ing them when it OOMs.

          Show
          Andy Isaacson added a comment - A server generally has a lot of String objects. There are also file objects in ReplicasMap, string paths tracked in many other places as well. The cluster in question has about 1.5 million blocks per DN, across 12 datadirs. This hprof shows 1,858,340 BlockScanInfo objects. MAT computed the "Retained Heap" of FsDatasetImpl at 980 MB and the "Retained Heap" of the DirectoryScanner thread at 1.4 GB. ScanInfo is a short lived object, unlike other data structures that are long lived. It doesn't matter how narrow the peak is, if it exceeds the maximum permissible value. In this case we seem to have a complete set of ScanInfo objects (for the entire dataset) active on the heap, with the DirectoryScanner thread in the process of reconcile()ing them when it OOMs.
          Hide
          Andy Isaacson added a comment -

          The actual OOM backtrace is on the DN thread:

            at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:25)
            at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat()Lorg/apache/hadoop/hdfs/server/protocol/HeartbeatResponse; (BPServiceActor.java:434)
            at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService()V (BPServiceActor.java:520)
            at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run()V (BPServiceActor.java:673)
            at java.lang.Thread.run()V (Thread.java:662)
          
          Show
          Andy Isaacson added a comment - The actual OOM backtrace is on the DN thread: at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:25) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat()Lorg/apache/hadoop/hdfs/server/protocol/HeartbeatResponse; (BPServiceActor.java:434) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService()V (BPServiceActor.java:520) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run()V (BPServiceActor.java:673) at java.lang.Thread.run()V (Thread.java:662)
          Hide
          Suresh Srinivas added a comment -

          I think my earlier comments perhaps are not clear. Let me give it another try

          +1 for optimizing the data structures in datanode.

          Suresh – we routinely see users with millions of replicas per DN now that 48TB+ configurations have become commodity. Sure, we should also encourage users to use things like HAR to coalesce into larger blocks, but easy wins on DN memory usage are a no-brainer IMO.

          This is again not the point I am making either. I know and understand that number of blocks in DN is growing. Data structures in datanode need to be optimized. At the same time, as the DNs support more storage, the DN heap also needs to be suitably increased.

          What my previous comments are related to the assertion that DirectoryScanner is causing OOM. OOM is not caused by the scanner. It is caused by incorrectly sizing the datanode JVM heap, unless one shows a leak in DirectoryScanner. So the comment was to edit the description to reflect it.

          We need to also optimize the long lived data structures in datanode. I thought one would start with that instead of DirectoryScanner, which creates short lived objects. Create HDFS-4465 to track that.

          Show
          Suresh Srinivas added a comment - I think my earlier comments perhaps are not clear. Let me give it another try +1 for optimizing the data structures in datanode. Suresh – we routinely see users with millions of replicas per DN now that 48TB+ configurations have become commodity. Sure, we should also encourage users to use things like HAR to coalesce into larger blocks, but easy wins on DN memory usage are a no-brainer IMO. This is again not the point I am making either. I know and understand that number of blocks in DN is growing. Data structures in datanode need to be optimized. At the same time, as the DNs support more storage, the DN heap also needs to be suitably increased. What my previous comments are related to the assertion that DirectoryScanner is causing OOM. OOM is not caused by the scanner. It is caused by incorrectly sizing the datanode JVM heap, unless one shows a leak in DirectoryScanner. So the comment was to edit the description to reflect it. We need to also optimize the long lived data structures in datanode. I thought one would start with that instead of DirectoryScanner, which creates short lived objects. Create HDFS-4465 to track that.
          Hide
          Harsh J added a comment -

          A general memory increase between 1.0 -> 2.0 can also be attributed to the use of the lengthy block pool ID bytes in the file paths? Minor regression, but very clearly present from my experience.

          Show
          Harsh J added a comment - A general memory increase between 1.0 -> 2.0 can also be attributed to the use of the lengthy block pool ID bytes in the file paths? Minor regression, but very clearly present from my experience.
          Hide
          Colin Patrick McCabe added a comment -

          Looks like we can cut the memory usage in half again – storing both the metafile path and the block file path is redundant, since you can always compute the block path from the meta path by chopping off the "_<genstamp>.meta" prefix.

          I'll see if I can do that...

          Show
          Colin Patrick McCabe added a comment - Looks like we can cut the memory usage in half again – storing both the metafile path and the block file path is redundant, since you can always compute the block path from the meta path by chopping off the "_<genstamp>.meta" prefix. I'll see if I can do that...
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12570534/HDFS-4461.004.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3998//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3998//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570534/HDFS-4461.004.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3998//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3998//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          FWIW we are running into this again with another customer who has a large number of blocks per DN. The heap usage is constant at around 4GB until the directory scanner runs. This causes an doubling of the required heap in the DN, which caused OOM. We've worked around it by bumping the heap even higher, but we should address this issue. The fact that the usage is short-lived doesn't matter, since the heap size has to be set according to the max heap, not the average heap.

          Show
          Todd Lipcon added a comment - FWIW we are running into this again with another customer who has a large number of blocks per DN. The heap usage is constant at around 4GB until the directory scanner runs. This causes an doubling of the required heap in the DN, which caused OOM. We've worked around it by bumping the heap even higher, but we should address this issue. The fact that the usage is short-lived doesn't matter, since the heap size has to be set according to the max heap, not the average heap.
          Hide
          Todd Lipcon added a comment -

          todd@todd-w510:~$ cat /tmp/c
          A few quick comments on the patch:

          +      return path.replaceAll("(?<!^)(\\\\|/){2,}",
          +          Matcher.quoteReplacement(File.separator));
          

          Can you introduce a constant Pattern instance for this regex? That way you'll avoid the overhead of recompiling the regex each time this is called.

          • Can you add comments to the blockSuffix and metaSuffix members about their behavior when null? If I understand correctly, it would be something like:
          /**
           * The block file path, relative to the volume's base directory.
           * If there was no block file found, this may be null. If 'vol'
           * is null, then this is the full path of the block file.
           */
          private final String blockSuffix;
          
          /**
           * The suffix of the meta file path relative to the block file.
           * If blockSuffix is null, then this will be the entire path relative
           * to the volume base directory, or an absolute path if vol is also
           * null.
           */
          private final String metaSuffix;
          
          • Maybe add a simple unit test for ScanInfo to test the various combinations, making sure that you get back the same paths that you put in?
          Show
          Todd Lipcon added a comment - todd@todd-w510:~$ cat /tmp/c A few quick comments on the patch: + return path.replaceAll( "(?<!^)(\\\\|/){2,}" , + Matcher.quoteReplacement(File.separator)); Can you introduce a constant Pattern instance for this regex? That way you'll avoid the overhead of recompiling the regex each time this is called. – Can you add comments to the blockSuffix and metaSuffix members about their behavior when null? If I understand correctly, it would be something like: /** * The block file path, relative to the volume's base directory. * If there was no block file found, this may be null . If 'vol' * is null , then this is the full path of the block file. */ private final String blockSuffix; /** * The suffix of the meta file path relative to the block file. * If blockSuffix is null , then this will be the entire path relative * to the volume base directory, or an absolute path if vol is also * null . */ private final String metaSuffix; Maybe add a simple unit test for ScanInfo to test the various combinations, making sure that you get back the same paths that you put in?
          Hide
          Colin Patrick McCabe added a comment -
          • use constant Pattern (optimization)
          • add comments
          • add ScanInfo test to TestDirectoryScanner
          Show
          Colin Patrick McCabe added a comment - use constant Pattern (optimization) add comments add ScanInfo test to TestDirectoryScanner
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12588456/HDFS-4661.006.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4532//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4532//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12588456/HDFS-4661.006.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4532//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4532//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          +1

          Show
          Todd Lipcon added a comment - +1
          Hide
          Colin Patrick McCabe added a comment -

          committed to trunk / branch-2

          Show
          Colin Patrick McCabe added a comment - committed to trunk / branch-2
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3975 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3975/)
          HDFS-4461. DirectoryScanner: volume prefix takes up memory for every block that is scanned (Colin Patrick McCabe) (Revision 1494401)

          Result = SUCCESS
          cmccabe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1494401
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3975 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3975/ ) HDFS-4461 . DirectoryScanner: volume prefix takes up memory for every block that is scanned (Colin Patrick McCabe) (Revision 1494401) Result = SUCCESS cmccabe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1494401 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #245 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/245/)
          HDFS-4461. DirectoryScanner: volume prefix takes up memory for every block that is scanned (Colin Patrick McCabe) (Revision 1494401)

          Result = FAILURE
          cmccabe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1494401
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #245 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/245/ ) HDFS-4461 . DirectoryScanner: volume prefix takes up memory for every block that is scanned (Colin Patrick McCabe) (Revision 1494401) Result = FAILURE cmccabe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1494401 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1435 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1435/)
          HDFS-4461. DirectoryScanner: volume prefix takes up memory for every block that is scanned (Colin Patrick McCabe) (Revision 1494401)

          Result = FAILURE
          cmccabe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1494401
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1435 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1435/ ) HDFS-4461 . DirectoryScanner: volume prefix takes up memory for every block that is scanned (Colin Patrick McCabe) (Revision 1494401) Result = FAILURE cmccabe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1494401 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1462 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1462/)
          HDFS-4461. DirectoryScanner: volume prefix takes up memory for every block that is scanned (Colin Patrick McCabe) (Revision 1494401)

          Result = FAILURE
          cmccabe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1494401
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1462 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1462/ ) HDFS-4461 . DirectoryScanner: volume prefix takes up memory for every block that is scanned (Colin Patrick McCabe) (Revision 1494401) Result = FAILURE cmccabe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1494401 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
          Hide
          Kihwal Lee added a comment -

          I thought we can wait till 2.x, but some 0.23 users are creating a lot of small files (i.e. small blocks) and DNs are running out of memory when DirectoryScanner runs. The peak heap usage can be almost 2x or even 3x of the base usage, if one dir scan garbage survives until the next scan.

          The patch is a straight back-port of the trunk version. The difference comes from the fact that a source file got split into multiple files in branch-2/trunk. Other than that the core change is exactly the same.

          Show
          Kihwal Lee added a comment - I thought we can wait till 2.x, but some 0.23 users are creating a lot of small files (i.e. small blocks) and DNs are running out of memory when DirectoryScanner runs. The peak heap usage can be almost 2x or even 3x of the base usage, if one dir scan garbage survives until the next scan. The patch is a straight back-port of the trunk version. The difference comes from the fact that a source file got split into multiple files in branch-2/trunk. Other than that the core change is exactly the same.
          Hide
          Colin Patrick McCabe added a comment -

          +1 for the backport. Note that I have not tested it, just reviewed it

          Show
          Colin Patrick McCabe added a comment - +1 for the backport. Note that I have not tested it, just reviewed it
          Hide
          Kihwal Lee added a comment -

          Thanks, Colin. I've checked it into branch-0.23.

          Show
          Kihwal Lee added a comment - Thanks, Colin. I've checked it into branch-0.23.

            People

            • Assignee:
              Colin Patrick McCabe
              Reporter:
              Colin Patrick McCabe
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development