Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Catalog
    • Labels:
      None
    • Epic Color:
      ghx-label-4

      Description

      When processing catalog metadata cache update, working memory usage could be 5x more than the final metadata object memory footprint. If GC doesn't recycle memory fast enough, Impala could crash due to JVM out of memory.

      most of it is coming from the HDFS client 
       
      https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Path.java#L147
       
      Stack Trace Average Object Size(bytes) Total TLAB size(bytes) Pressure(%)

      java.lang.Thread.run() 152.486 6,586,166,960 78.246
         java.util.concurrent.ThreadPoolExecutor$Worker.run() 152.959 6,583,034,136 78.208
            java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 152.959 6,583,034,136 78.208
               java.util.concurrent.FutureTask.run() 154.425 6,575,955,192 78.124
                  org.apache.impala.catalog.HdfsTable$FileMetadataLoadRequest.call() 155.678 6,561,367,568 77.951
                     org.apache.impala.catalog.HdfsTable$FileMetadataLoadRequest.call() 155.678 6,561,367,568 77.951
                        org.apache.impala.catalog.HdfsTable.access$000(HdfsTable, Path, List) 155.678 6,561,367,568 77.951
                           org.apache.impala.catalog.HdfsTable.refreshFileMetadata(Path, List) 155.678 6,561,367,568 77.951
                              org.apache.impala.common.FileSystemUtil.listStatus(FileSystem, Path) 164.294 5,958,270,360 70.786
                                 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(Path) 164.294 5,958,270,360 70.786
                                    org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystem, Path) 164.294 5,958,270,360 70.786
                                       org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(Path) 164.294 5,958,270,360 70.786
                                          org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(Path) 164.294 5,958,270,360 70.786
                                             org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem, Path) 164.294 5,958,270,360 70.786
                                                org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(Path) 164.294 5,958,270,360 70.786
                                                   org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(URI, Path) 188.964 4,715,516,408 56.022
                                                      org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(Path) 190.731 4,649,582,248 55.238
                                                         org.apache.hadoop.fs.Path.<init>(Path, String) 193.378 4,543,189,320 53.974
                                                            org.apache.hadoop.fs.Path.<init>(Path, Path) 202.23 4,204,506,424 49.951
                                                               org.apache.hadoop.fs.Path.initialize(String, String, String, String) 231.389 1,623,793,272 19.291
                                                               java.net.URI.<init>(String, String, String, String, String) 162.808 1,219,880,472 14.493
                                                               java.net.URI.resolve(URI) 226.126 596,637,792 7.088
                                                               java.lang.StringBuilder.append(String) 253.489 404,781,104 4.809
                                                               java.lang.StringBuilder.toString() 132.941 180,183,984 2.141
                                                               java.lang.StringBuilder.<init>() 48 72,680,008 0.863
      

      Different GC strategy may help release some memory pressure, but it's better to see if we could reduce the working memory.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jyu@cloudera.com Juan Yu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: