Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-4
Description
When processing catalog metadata cache update, working memory usage could be 5x more than the final metadata object memory footprint. If GC doesn't recycle memory fast enough, Impala could crash due to JVM out of memory.
most of it is coming from the HDFS client
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Path.java#L147
Stack Trace Average Object Size(bytes) Total TLAB size(bytes) Pressure(%)
java.lang.Thread.run() 152.486 6,586,166,960 78.246 java.util.concurrent.ThreadPoolExecutor$Worker.run() 152.959 6,583,034,136 78.208 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 152.959 6,583,034,136 78.208 java.util.concurrent.FutureTask.run() 154.425 6,575,955,192 78.124 org.apache.impala.catalog.HdfsTable$FileMetadataLoadRequest.call() 155.678 6,561,367,568 77.951 org.apache.impala.catalog.HdfsTable$FileMetadataLoadRequest.call() 155.678 6,561,367,568 77.951 org.apache.impala.catalog.HdfsTable.access$000(HdfsTable, Path, List) 155.678 6,561,367,568 77.951 org.apache.impala.catalog.HdfsTable.refreshFileMetadata(Path, List) 155.678 6,561,367,568 77.951 org.apache.impala.common.FileSystemUtil.listStatus(FileSystem, Path) 164.294 5,958,270,360 70.786 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(Path) 164.294 5,958,270,360 70.786 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystem, Path) 164.294 5,958,270,360 70.786 org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(Path) 164.294 5,958,270,360 70.786 org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(Path) 164.294 5,958,270,360 70.786 org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem, Path) 164.294 5,958,270,360 70.786 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(Path) 164.294 5,958,270,360 70.786 org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(URI, Path) 188.964 4,715,516,408 56.022 org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(Path) 190.731 4,649,582,248 55.238 org.apache.hadoop.fs.Path.<init>(Path, String) 193.378 4,543,189,320 53.974 org.apache.hadoop.fs.Path.<init>(Path, Path) 202.23 4,204,506,424 49.951 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 231.389 1,623,793,272 19.291 java.net.URI.<init>(String, String, String, String, String) 162.808 1,219,880,472 14.493 java.net.URI.resolve(URI) 226.126 596,637,792 7.088 java.lang.StringBuilder.append(String) 253.489 404,781,104 4.809 java.lang.StringBuilder.toString() 132.941 180,183,984 2.141 java.lang.StringBuilder.<init>() 48 72,680,008 0.863
Different GC strategy may help release some memory pressure, but it's better to see if we could reduce the working memory.