Uploaded image for project: 'Sentry'
  1. Sentry
  2. SENTRY-1907

Potential memory optimization when handling big full snapshots.

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.0.0
    • Component/s: Sentry
    • Labels:
      None

      Description

      PathImageRetriever.retrieveFullImage() has the following code:

            for (Map.Entry<String, Set<String>> pathEnt : pathImage.entrySet()) {
              TPathChanges pathChange = pathsUpdate.newPathChange(pathEnt.getKey());
      
              for (String path : pathEnt.getValue()) {
                pathChange.addToAddPaths(Lists.newArrayList(Splitter.on("/").split(path))); // here
              }
            }
      

      We convert many paths objects to list of strings per component so /a/b/c becomes

      {a, b, c}

      . There are tons of duplicates there, so after we split we should intern each component before adding it.

      This was observed by code inspection and confirmed by jxray analysis (thanks Misha Dmitriev) which shows that 61% of memory is used by duplicate strings and shows the following stack trace:

      4. REFERENCE CHAINS WITH HIGH RETAINED MEMORY (MAY SIGNAL MEMORY LEAK)
      
       ---- Object tree for GC root(s) Java Local@3c8e00c80 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate) ----
      
        4,159,037K (33.4%) (1 of org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
           <-- Java Local@3c8e00c80 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
        4,135,376K (33.3%) (4897951 of j.u.ArrayList)
           <-- {j.u.ArrayList} <-- org.apache.sentry.hdfs.service.thrift.TPathChanges.addPaths <-- {j.u.ArrayList} <-- org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathChanges <-- Java Local@3c8e00c80 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
        3,652,177K (29.4%) (52086231 objects)
           <-- {j.u.ArrayList} <-- {j.u.ArrayList} <-- org.apache.sentry.hdfs.service.thrift.TPathChanges.addPaths <-- {j.u.ArrayList} <-- org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathChanges <-- Java Local@3c8e00c80 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
        GC root stack trace:
          org.apache.sentry.hdfs.service.thrift.TPathsUpdate$TPathsUpdateStandardScheme.write(TPathsUpdate.java:754)
          org.apache.sentry.hdfs.service.thrift.TPathsUpdate$TPathsUpdateStandardScheme.write(TPathsUpdate.java:671)
          org.apache.sentry.hdfs.service.thrift.TPathsUpdate.write(TPathsUpdate.java:584)
          org.apache.sentry.hdfs.service.thrift.TAuthzUpdateResponse$TAuthzUpdateResponseStandardScheme.write(TAuthzUpdateResponse.java:505)
          org.apache.sentry.hdfs.service.thrift.TAuthzUpdateResponse$TAuthzUpdateResponseStandardScheme.write(TAuthzUpdateResponse.java:435)
          org.apache.sentry.hdfs.service.thrift.TAuthzUpdateResponse.write(TAuthzUpdateResponse.java:377)
          org.apache.sentry.hdfs.service.thrift.SentryHDFSService$get_authz_updates_result$get_authz_updates_resultStandardScheme.write(SentryHDFSService.java:3608)
          org.apache.sentry.hdfs.service.thrift.SentryHDFSService$get_authz_updates_result$get_authz_updates_resultStandardScheme.write(SentryHDFSService.java:3572)
          org.apache.sentry.hdfs.service.thrift.SentryHDFSService$get_authz_updates_result.write(SentryHDFSService.java:3523)
          org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
          org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
          org.apache.sentry.hdfs.SentryHDFSServiceProcessorFactory$ProcessorWrapper.process(SentryHDFSServiceProcessorFactory.java:47)
          org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
          org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
          java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          java.lang.Thread.run(Thread.java:745)
      

        Attachments

        1. SENTRY-1907.01.patch
          4 kB
          Alexander Kolbasov

          Issue Links

            Activity

              People

              • Assignee:
                akolb Alexander Kolbasov
                Reporter:
                akolb Alexander Kolbasov
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: