Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14523

OpensslAesCtrCryptoCodec.finalize() holds excessive amounts of memory

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha4
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I recently analyzed JVM heap dumps from Hive running a big workload. Two excerpts from the analysis done with jxray (www.jxray.com) are given below. It turns out that nearly a half of live memory is taken by objects awaiting finalization, and the biggest offender among them is class OpensslAesCtrCryptoCodec:

        401,189K (39.7%) (1 of sun.misc.Cleaner)
           <-- Java Static: sun.misc.Cleaner.first
        400,572K (39.6%) (14001 of org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager, java.util.jar.JarFile etc.)
           <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- sun.misc.Cleaner.next <-- sun.misc.Cleaner.{next} <-- Java Static: sun.misc.Cleaner.first
        270,673K (26.8%) (2138 of org.apache.hadoop.mapred.JobConf)
           <-- org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.conf <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- sun.misc.Cleaner.next <-- sun.misc.Cleaner.{next} <-- Java Static: sun.misc.Cleaner.first
      
      ---------------------
      
        102,232K (10.1%) (1 of j.l.r.Finalizer)
           <-- Java Static: java.lang.ref.Finalizer.unfinalized
        101,676K (10.1%) (8613 of org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, java.util.zip.ZipFile$ZipFileInflaterInputStream, org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager etc.)
           <-- j.l.r.Finalizer.referent <-- j.l.r.Finalizer.{next} <-- Java Static: java.lang.ref.Finalizer.unfinalized
      

      This heap dump was taken using 'jmap -dump:live', which forces the JVM to run full GC before dumping the heap. So we are already looking at the heap right after GC, and yet all these unfinalized objects are there. I think this happens because the JVM always runs only one finalization thread, and thus the queue of objects that need finalization may get processed too slowly. My understanding is that finalization works as follows:

      1. When GC runs, it discovers that object x that overrides finalize() is unreachable.
      2. x is added to the finalization queue. So technically x is still reachable, it occupies memory, and all the objects that it references stay in memory as well.
      3. The finalization thread processes objects from the finalization queue serially, thus x may stay in memory for long time.
      4. x.finalize() is invoked, then x is made unreachable. If x stayed in memory for long time, it's now in Old Gen of the heap, so only full GC can clean it up.
      5. When full GC finally occurs, x gets cleaned up.

      So finalization is formally reliable, but in practice it's quite possible that a lot of unreachable, but unfinalized objects flood the memory. I guess we are seeing all these OpensslAesCtrCryptoCodec objects when they are in phase 3 above. And the really bad thing is that these objects in turn keep in memory a whole lot of other stuff, in particular JobConf objects. Such a JobConf has nothing to do with finalization, yet the GC cannot release it until the corresponding OpensslAesCtrCryptoCodec's is gone.

      Here is OpensslAesCtrCryptoCodec.finalize() method with my comments:

      protected void finalize() throws Throwable {
        try {
          Closeable r = (Closeable) this.random;
          r.close();  // Relevant only when (random instanceof OsSecureRandom == true)
        } catch (ClassCastException e) {
        }
        super.finalize();  // Not needed, no finalize() in superclasses
      }
      

      So, finalize() in this class, that may keep in memory a whole tree of objects, is relevant only when this codec is configured to use OsSecureRandom class. The latter reads random bytes from the configured file, and needs finalization to close the input stream associated with that file.

      The suggested fix is to remove finalize() from OpensslAesCtrCryptoCodec and add it to the only class from this "family" that really needs it, OsSecureRandom. That will ensure that only OsSecureRandom objects (if/when they are used) stay in memory awaiting finalization, and no other, irrelevant objects.

      Note that this solution means that streams are still closed lazily. This, in principle, may cause its own problems. So the most reliable fix would be to call OsSecureRandom.close() explicitly when it's not needed anymore. But the above fix is a necessary first step anyway, it will remove the most acute problem with memory and will not make any other things worse than they currently are.

      1. HADOOP-14523.01.patch
        2 kB
        Misha Dmitriev
      2. HADOOP-14523.02.patch
        2 kB
        Misha Dmitriev

        Issue Links

          Activity

          Hide
          jzhuge John Zhuge added a comment -

          Great work Misha Dmitriev. Have you got a chance to rerun JXRay after the fix?

          Show
          jzhuge John Zhuge added a comment - Great work Misha Dmitriev . Have you got a chance to rerun JXRay after the fix?
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11868 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11868/)
          HADOOP-14523. OpensslAesCtrCryptoCodec.finalize() holds excessive (xiao: rev ef8edab930338646551cbe3c7e7cf954e21c0f9a)

          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/random/OsSecureRandom.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/OpensslAesCtrCryptoCodec.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11868 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11868/ ) HADOOP-14523 . OpensslAesCtrCryptoCodec.finalize() holds excessive (xiao: rev ef8edab930338646551cbe3c7e7cf954e21c0f9a) (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/random/OsSecureRandom.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/OpensslAesCtrCryptoCodec.java
          Hide
          xiaochen Xiao Chen added a comment -

          Committed to trunk and branch-2. Thank you Misha Dmitriev for your contribution!

          Show
          xiaochen Xiao Chen added a comment - Committed to trunk and branch-2. Thank you Misha Dmitriev for your contribution!
          Hide
          xiaochen Xiao Chen added a comment -

          The change is for performance so I feel we don't need a unit test. Findbugs are extant and not relevant to the patch.

          +1 on patch 2, committing this.

          Show
          xiaochen Xiao Chen added a comment - The change is for performance so I feel we don't need a unit test. Findbugs are extant and not relevant to the patch. +1 on patch 2, committing this.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 14m 12s trunk passed
          +1 compile 13m 25s trunk passed
          +1 checkstyle 0m 36s trunk passed
          +1 mvnsite 1m 4s trunk passed
          -1 findbugs 1m 23s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings.
          +1 javadoc 0m 49s trunk passed
          +1 mvninstall 0m 38s the patch passed
          +1 compile 10m 5s the patch passed
          +1 javac 10m 5s the patch passed
          +1 checkstyle 0m 35s the patch passed
          +1 mvnsite 1m 1s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 31s the patch passed
          +1 javadoc 0m 49s the patch passed
          +1 unit 7m 58s hadoop-common in the patch passed.
          +1 asflicense 0m 34s The patch does not generate ASF License warnings.
          56m 42s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14523
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872897/HADOOP-14523.02.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 29a9328959eb 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 036a24b
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12528/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12528/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12528/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 14m 12s trunk passed +1 compile 13m 25s trunk passed +1 checkstyle 0m 36s trunk passed +1 mvnsite 1m 4s trunk passed -1 findbugs 1m 23s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings. +1 javadoc 0m 49s trunk passed +1 mvninstall 0m 38s the patch passed +1 compile 10m 5s the patch passed +1 javac 10m 5s the patch passed +1 checkstyle 0m 35s the patch passed +1 mvnsite 1m 1s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 31s the patch passed +1 javadoc 0m 49s the patch passed +1 unit 7m 58s hadoop-common in the patch passed. +1 asflicense 0m 34s The patch does not generate ASF License warnings. 56m 42s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14523 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872897/HADOOP-14523.02.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 29a9328959eb 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 036a24b Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12528/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12528/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12528/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          misha@cloudera.com Misha Dmitriev added a comment -

          Fixed checkstyle. The test that failed on the previous patch looks unrelated.

          Show
          misha@cloudera.com Misha Dmitriev added a comment - Fixed checkstyle. The test that failed on the previous patch looks unrelated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 13m 25s trunk passed
          +1 compile 13m 31s trunk passed
          +1 checkstyle 0m 35s trunk passed
          +1 mvnsite 1m 5s trunk passed
          -1 findbugs 1m 23s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings.
          +1 javadoc 0m 50s trunk passed
          +1 mvninstall 0m 40s the patch passed
          +1 compile 10m 25s the patch passed
          +1 javac 10m 25s the patch passed
          -0 checkstyle 0m 36s hadoop-common-project/hadoop-common: The patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3)
          +1 mvnsite 1m 4s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 37s the patch passed
          +1 javadoc 0m 57s the patch passed
          -1 unit 12m 56s hadoop-common in the patch failed.
          +1 asflicense 0m 35s The patch does not generate ASF License warnings.
          61m 46s



          Reason Tests
          Failed junit tests hadoop.security.TestRaceWhenRelogin



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14523
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872889/HADOOP-14523.01.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 4c57da710495 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8633ef8
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 13m 25s trunk passed +1 compile 13m 31s trunk passed +1 checkstyle 0m 35s trunk passed +1 mvnsite 1m 5s trunk passed -1 findbugs 1m 23s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings. +1 javadoc 0m 50s trunk passed +1 mvninstall 0m 40s the patch passed +1 compile 10m 25s the patch passed +1 javac 10m 25s the patch passed -0 checkstyle 0m 36s hadoop-common-project/hadoop-common: The patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3) +1 mvnsite 1m 4s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 37s the patch passed +1 javadoc 0m 57s the patch passed -1 unit 12m 56s hadoop-common in the patch failed. +1 asflicense 0m 35s The patch does not generate ASF License warnings. 61m 46s Reason Tests Failed junit tests hadoop.security.TestRaceWhenRelogin Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14523 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872889/HADOOP-14523.01.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 4c57da710495 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8633ef8 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12526/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

            People

            • Assignee:
              misha@cloudera.com Misha Dmitriev
              Reporter:
              misha@cloudera.com Misha Dmitriev
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development