Derby
  1. Derby
  2. DERBY-5730

DataDictionaryImpl leaks references to itself via SYSFUN_AD

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.9.1.0
    • Fix Version/s: 10.8.3.3, 10.9.1.0
    • Component/s: Services
    • Labels:
      None
    • Bug behavior facts:
      Crash

      Description

      DataDictionaryImpl contains a static field called SYSFUN_AD, which holds an array of AliasDescriptor objects for the functions in the SYSFUN schema. This array is lazily populated as the functions are called, and it is shared between all DataDictionaryImpl instances on the system.

      The AliasDescriptors contain references to the DataDictionaryImpl that created them, so SYSFUN_AD may end up referencing indirectly to a number of different DataDictionaryImpl instances, depending on where the respective SYSFUN functions are called the first time. Once an AliasDescriptor has been added to SYSFUN_AD, it stays there until the DataDictionaryImpl class is unloaded (in most cases, until the JVM terminates). This means the array may hold references to DataDictionaryImpl instances that belong to database instances that have been shut down, and that the memory held by those database instances never becomes eligible for garbage collection.

      1. DERBY5730_backport_patch1_diff.txt
        2 kB
        Mamta A. Satoor
      2. SysfunPerftest.java
        0.9 kB
        Knut Anders Hatlen
      3. d5730-1a.diff
        9 kB
        Knut Anders Hatlen
      4. Sysfun.java
        4 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          Attaching a repro that exposes the leak.

          The repro repeatedly boot a database, performs some db operations, and shuts down the database. One of the db operations it performs, is preparing a statement that calls one of the SYSFUN functions, and it is a different function in each iteration.

          When running the repro with -Xmx20M, I see an OutOfMemoryError. If I comment out the preparing of the function call, the repro completes successfully with -Xmx5M.

          Show
          Knut Anders Hatlen added a comment - Attaching a repro that exposes the leak. The repro repeatedly boot a database, performs some db operations, and shuts down the database. One of the db operations it performs, is preparing a statement that calls one of the SYSFUN functions, and it is a different function in each iteration. When running the repro with -Xmx20M, I see an OutOfMemoryError. If I comment out the preparing of the function call, the repro completes successfully with -Xmx5M.
          Hide
          Knut Anders Hatlen added a comment -

          Marking the bug as a crash, since it's a memory leak that could potentially cause OOME.

          Show
          Knut Anders Hatlen added a comment - Marking the bug as a crash, since it's a memory leak that could potentially cause OOME.
          Hide
          Knut Anders Hatlen added a comment -

          The attached patch makes each DataDictionaryImpl instance have its own copy of the AliasDescriptors for the SYSFUN functions. This avoids keeping references to old dictionary instances in a shared static array, and allows gc of the instances before the DataDictionaryImpl class itself is eligible for gc.

          The patch also adds a new test to the memory suite. The test reports an OOME without the fix.

          All the regression tests ran cleanly with the patch.

          Show
          Knut Anders Hatlen added a comment - The attached patch makes each DataDictionaryImpl instance have its own copy of the AliasDescriptors for the SYSFUN functions. This avoids keeping references to old dictionary instances in a shared static array, and allows gc of the instances before the DataDictionaryImpl class itself is eligible for gc. The patch also adds a new test to the memory suite. The test reports an OOME without the fix. All the regression tests ran cleanly with the patch.
          Hide
          Mike Matrigali added a comment -

          the patch looks good to me.

          Can you comment
          on expected performance degredation now that we are caching less, I don't know what work the caching is saving. Since we are still caching privately I think only expected degredation is on first call for booted db, though I know there are some applications out there that boot very often. If there were no memory consequences does the existing
          implementation have any correctness issues?

          Show
          Mike Matrigali added a comment - the patch looks good to me. Can you comment on expected performance degredation now that we are caching less, I don't know what work the caching is saving. Since we are still caching privately I think only expected degredation is on first call for booted db, though I know there are some applications out there that boot very often. If there were no memory consequences does the existing implementation have any correctness issues?
          Hide
          Knut Anders Hatlen added a comment -

          > Can you comment on expected performance degredation now that we are
          > caching less, I don't know what work the caching is saving. Since we
          > are still caching privately I think only expected degredation is on
          > first call for booted db, though I know there are some applications
          > out there that boot very often.

          That's right, degradation should only be expected on the first call of
          a SYSFUN function after the second db boot.

          I'm not sure what degradation to expect myself, so I've written a
          small test program that executes the following statement repeatedly:

          values atan(tan(asin(sin(acos(cos(pi()))))))

          The statement calls seven different SYSFUN functions, and I would
          expect it to be very close to the worst case since VALUES statements
          are among the cheapest ones to compile.

          I also turned off the statement cache, so that each execution of the
          statement would cause a compilation (it is during the bind phase of
          the compilation the SYSFUN_AD array is accessed).

          Using the test program, I compared clean trunk to a variant that was
          modified to skip the caching altogether. That is, I commented out the
          following line in DataDictionaryImpl.getRoutineList():

          DataDictionaryImpl.SYSFUN_AD[f] = ad;

          The variant that didn't cache the alias descriptor, executed the
          statement 2.5% slower on average in my environment.

          Since this degradation will only be observed when SYSFUN functions are
          compiled for the first time in newly booted databases, I'd say it's
          negligible.

          One might perhaps even say that the gains of the alias descriptor
          cache are so small that they don't justify the extra memory and code
          complexity, but I don't think we want to go there in this JIRA.

          > If there were no memory consequences does the existing
          > implementation have any correctness issues?

          Not that I have been able to find. The data dictionary instance is
          only used in the following situations, as far as I can tell:

          • At initialization time of the AliasDescriptor to find the UUID of
            the SYSFUN schema. This doesn't cause any problems, since the data
            dictionary instance is still alive at that time, and the UUID of the
            SYSFUN schema is the same in all databases.
          • When an alias is dropped, but the functions in SYSFUN can't be
            dropped.
          • During invalidation, but SYSFUN functions can't be invalidated.
          Show
          Knut Anders Hatlen added a comment - > Can you comment on expected performance degredation now that we are > caching less, I don't know what work the caching is saving. Since we > are still caching privately I think only expected degredation is on > first call for booted db, though I know there are some applications > out there that boot very often. That's right, degradation should only be expected on the first call of a SYSFUN function after the second db boot. I'm not sure what degradation to expect myself, so I've written a small test program that executes the following statement repeatedly: values atan(tan(asin(sin(acos(cos(pi())))))) The statement calls seven different SYSFUN functions, and I would expect it to be very close to the worst case since VALUES statements are among the cheapest ones to compile. I also turned off the statement cache, so that each execution of the statement would cause a compilation (it is during the bind phase of the compilation the SYSFUN_AD array is accessed). Using the test program, I compared clean trunk to a variant that was modified to skip the caching altogether. That is, I commented out the following line in DataDictionaryImpl.getRoutineList(): DataDictionaryImpl.SYSFUN_AD [f] = ad; The variant that didn't cache the alias descriptor, executed the statement 2.5% slower on average in my environment. Since this degradation will only be observed when SYSFUN functions are compiled for the first time in newly booted databases, I'd say it's negligible. One might perhaps even say that the gains of the alias descriptor cache are so small that they don't justify the extra memory and code complexity, but I don't think we want to go there in this JIRA. > If there were no memory consequences does the existing > implementation have any correctness issues? Not that I have been able to find. The data dictionary instance is only used in the following situations, as far as I can tell: At initialization time of the AliasDescriptor to find the UUID of the SYSFUN schema. This doesn't cause any problems, since the data dictionary instance is still alive at that time, and the UUID of the SYSFUN schema is the same in all databases. When an alias is dropped, but the functions in SYSFUN can't be dropped. During invalidation, but SYSFUN functions can't be invalidated.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching the program I used to test the performance impact.

          Show
          Knut Anders Hatlen added a comment - Attaching the program I used to test the performance impact.
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 1335423.

          Show
          Knut Anders Hatlen added a comment - Committed revision 1335423.
          Hide
          Mamta A. Satoor added a comment - - edited

          I am working on backporting the fix to 10.8 codeline and I get error for the newly added test through the patch(I am using IBM jdk1.6 to run the test)
          1) testLeak(org.apache.derbyTesting.functionTests.tests.memory.Derby5730Test)java.security.AccessControlException: Access denied (java.io.FilePermission C:\Program Files (x86)\IBM\Java60\jre\bin\java execute)
          at java.security.AccessController.checkPermission(AccessController.java:108)
          at java.lang.SecurityManager.checkPermission(SecurityManager.java:533)
          at java.lang.SecurityManager.checkExec(SecurityManager.java:780)
          at java.lang.ProcessBuilder.start(ProcessBuilder.java:448)
          at java.lang.Runtime.exec(Runtime.java:593)
          at java.lang.Runtime.exec(Runtime.java:466)
          at org.apache.derbyTesting.junit.BaseTestCase$8.run(BaseTestCase.java:558)
          at java.security.AccessController.doPrivileged(AccessController.java:251)
          at org.apache.derbyTesting.junit.BaseTestCase.execJavaCmd(BaseTestCase.java:554)
          at org.apache.derbyTesting.functionTests.tests.memory.Derby5730Test.testLeak(Derby5730Test.java:64)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:113)

          The error happens whether the test is run stand alone, or as part of the entire junit suite. I ran the test with classes and with jars and still get the same error. Do I need to change something in derby_tests.policy for this test to run correctly? I have been looking at derby mailing list to look for a fix for access control exception but haven't found anything yet. Thanks

          Show
          Mamta A. Satoor added a comment - - edited I am working on backporting the fix to 10.8 codeline and I get error for the newly added test through the patch(I am using IBM jdk1.6 to run the test) 1) testLeak(org.apache.derbyTesting.functionTests.tests.memory.Derby5730Test)java.security.AccessControlException: Access denied (java.io.FilePermission C:\Program Files (x86)\IBM\Java60\jre\bin\java execute) at java.security.AccessController.checkPermission(AccessController.java:108) at java.lang.SecurityManager.checkPermission(SecurityManager.java:533) at java.lang.SecurityManager.checkExec(SecurityManager.java:780) at java.lang.ProcessBuilder.start(ProcessBuilder.java:448) at java.lang.Runtime.exec(Runtime.java:593) at java.lang.Runtime.exec(Runtime.java:466) at org.apache.derbyTesting.junit.BaseTestCase$8.run(BaseTestCase.java:558) at java.security.AccessController.doPrivileged(AccessController.java:251) at org.apache.derbyTesting.junit.BaseTestCase.execJavaCmd(BaseTestCase.java:554) at org.apache.derbyTesting.functionTests.tests.memory.Derby5730Test.testLeak(Derby5730Test.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:113) The error happens whether the test is run stand alone, or as part of the entire junit suite. I ran the test with classes and with jars and still get the same error. Do I need to change something in derby_tests.policy for this test to run correctly? I have been looking at derby mailing list to look for a fix for access control exception but haven't found anything yet. Thanks
          Hide
          Knut Anders Hatlen added a comment -

          The required execute permission was added to derby_tests.policy in 10.9 as part of DERBY-4249.

          Show
          Knut Anders Hatlen added a comment - The required execute permission was added to derby_tests.policy in 10.9 as part of DERBY-4249 .
          Hide
          Knut Anders Hatlen added a comment -

          If this fix is backported, please make sure to backport DERBY-5753 as well to avoid regression test failures.

          Show
          Knut Anders Hatlen added a comment - If this fix is backported, please make sure to backport DERBY-5753 as well to avoid regression test failures.
          Hide
          ASF subversion and git services added a comment -

          Commit 1504205 from Mamta A. Satoor in branch 'code/branches/10'
          [ https://svn.apache.org/r1504205 ]

          DERBY-5730 (DataDictionaryImpl leaks references to itself via SYSFUN_AD)
          DERBY-5753: (Derby5730Test fails on Java 5)

          Backporting to 10.8. Original fix contributed by Knut.

          Show
          ASF subversion and git services added a comment - Commit 1504205 from Mamta A. Satoor in branch 'code/branches/10' [ https://svn.apache.org/r1504205 ] DERBY-5730 (DataDictionaryImpl leaks references to itself via SYSFUN_AD) DERBY-5753 : (Derby5730Test fails on Java 5) Backporting to 10.8. Original fix contributed by Knut.
          Hide
          Mamta A. Satoor added a comment -

          Knut, thanks for pointing to DERBY-4249 for execute permission. With those changes to derby_tests.policy, the test runs fine. derbyall and junit ran fine with the backport of this jira and DERBY-5753. I have committed the backport to 10.8

          Show
          Mamta A. Satoor added a comment - Knut, thanks for pointing to DERBY-4249 for execute permission. With those changes to derby_tests.policy, the test runs fine. derbyall and junit ran fine with the backport of this jira and DERBY-5753 . I have committed the backport to 10.8

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development