Description
PathImageRetriever.retrieveFullImage() has the following code:
for (Map.Entry<String, Set<String>> pathEnt : pathImage.entrySet()) { TPathChanges pathChange = pathsUpdate.newPathChange(pathEnt.getKey()); for (String path : pathEnt.getValue()) { pathChange.addToAddPaths(Lists.newArrayList(Splitter.on("/").split(path))); // here } }
We convert many paths objects to list of strings per component so /a/b/c becomes
{a, b, c}. There are tons of duplicates there, so after we split we should intern each component before adding it.
This was observed by code inspection and confirmed by jxray analysis (thanks misha@cloudera.com) which shows that 61% of memory is used by duplicate strings and shows the following stack trace:
4. REFERENCE CHAINS WITH HIGH RETAINED MEMORY (MAY SIGNAL MEMORY LEAK) ---- Object tree for GC root(s) Java Local@3c8e00c80 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate) ---- 4,159,037K (33.4%) (1 of org.apache.sentry.hdfs.service.thrift.TPathsUpdate) <-- Java Local@3c8e00c80 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate) 4,135,376K (33.3%) (4897951 of j.u.ArrayList) <-- {j.u.ArrayList} <-- org.apache.sentry.hdfs.service.thrift.TPathChanges.addPaths <-- {j.u.ArrayList} <-- org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathChanges <-- Java Local@3c8e00c80 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate) 3,652,177K (29.4%) (52086231 objects) <-- {j.u.ArrayList} <-- {j.u.ArrayList} <-- org.apache.sentry.hdfs.service.thrift.TPathChanges.addPaths <-- {j.u.ArrayList} <-- org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathChanges <-- Java Local@3c8e00c80 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate) GC root stack trace: org.apache.sentry.hdfs.service.thrift.TPathsUpdate$TPathsUpdateStandardScheme.write(TPathsUpdate.java:754) org.apache.sentry.hdfs.service.thrift.TPathsUpdate$TPathsUpdateStandardScheme.write(TPathsUpdate.java:671) org.apache.sentry.hdfs.service.thrift.TPathsUpdate.write(TPathsUpdate.java:584) org.apache.sentry.hdfs.service.thrift.TAuthzUpdateResponse$TAuthzUpdateResponseStandardScheme.write(TAuthzUpdateResponse.java:505) org.apache.sentry.hdfs.service.thrift.TAuthzUpdateResponse$TAuthzUpdateResponseStandardScheme.write(TAuthzUpdateResponse.java:435) org.apache.sentry.hdfs.service.thrift.TAuthzUpdateResponse.write(TAuthzUpdateResponse.java:377) org.apache.sentry.hdfs.service.thrift.SentryHDFSService$get_authz_updates_result$get_authz_updates_resultStandardScheme.write(SentryHDFSService.java:3608) org.apache.sentry.hdfs.service.thrift.SentryHDFSService$get_authz_updates_result$get_authz_updates_resultStandardScheme.write(SentryHDFSService.java:3572) org.apache.sentry.hdfs.service.thrift.SentryHDFSService$get_authz_updates_result.write(SentryHDFSService.java:3523) org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) org.apache.sentry.hdfs.SentryHDFSServiceProcessorFactory$ProcessorWrapper.process(SentryHDFSServiceProcessorFactory.java:47) org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123) org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745)
Attachments
Attachments
Issue Links
- is related to
-
SENTRY-1915 Sentry is doing a lot of work to convert list of paths to HMSPaths structure
- Resolved