Derby
  1. Derby
  2. DERBY-23

just booting jdbc driver and shutting down seem to leak memory

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.0.2.0
    • Fix Version/s: 10.2.1.6
    • Component/s: Services
    • Labels:
      None
    • Urgency:
      Low

      Description

      Reporting for Daniel John Debrunner.

      Doing simple boot and shutdown of the driver in a loop
      seem to grow the heap forever:
      new org.apache.derby.jdbc.EmbeddedDriver(); //boot the driver
      DriverManager.getConnection("jdbc:derby:;shutdown=true");
      //shutdown the cloudscape instance completely

      after booting and shutting down 1516 times, memory used is
      ~~41931056 bytes.
      Sample GC Output:
      1931.93: [Full GC 43484K->43243K(65088K), 0.1779751 secs]
      1933.13: [Full GC 43511K->43270K(65088K), 0.1914383 secs]
      1934.47: [Full GC 43538K->43297K(65088K), 0.1808878 secs]
      1935.68: [Full GC 43564K->43324K(65088K), 0.4461623 secs]
      1937.43: [Full GC 43591K->43350K(65088K), 0.1842980 secs]
      1938.63: [Full GC 43617K->43377K(65088K), 0.1873431 secs]
      1939.85: [Full GC 43644K->43404K(65088K), 0.1948505 secs]
      1941.07: [Full GC 43671K->43430K(65088K), 0.1790895 secs]

      1. derbyall_report.txt
        205 kB
        Knut Anders Hatlen
      2. DERBY-23-with_comments.diff
        2 kB
        Knut Anders Hatlen
      3. DERBY-23-npe.stat
        0.1 kB
        Knut Anders Hatlen
      4. DERBY-23-npe.diff
        3 kB
        Knut Anders Hatlen
      5. DERBY-23.stat
        0.2 kB
        Knut Anders Hatlen
      6. DERBY-23.diff
        2 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          When I looked into this issue, I found a problem with the synchronization. If Derby is shut down right after the driver is loaded, the AntiGC thread will continue to run in most cases. Fixing this problem made the heap grow slower, but it was not enough to stop the memory leak entirely.

          Show
          Knut Anders Hatlen added a comment - When I looked into this issue, I found a problem with the synchronization. If Derby is shut down right after the driver is loaded, the AntiGC thread will continue to run in most cases. Fixing this problem made the heap grow slower, but it was not enough to stop the memory leak entirely.
          Hide
          Knut Anders Hatlen added a comment -

          The memory leak comes from three sources:

          1) The AntiGC thread is sometimes not terminated. This happens when
          Derby is shut down right after the driver is loaded, because the
          run() method in AntiGC starts with setting a boolean variable which
          is used to decide when the thread should stop. The same variable is
          set when a request to shut down Derby is sent, and if the shutdown
          is requested before the AntiGC thread has started running, the
          request to shut down Derby might not be noticed by AntiGC which
          will run forever.

          Solution: Don't set the boolean variable in AntiGC's run() method. It
          is enough that it is being set when the object is initialized.

          2) Every time the driver is loaded, a new ThreadGroup is created. They
          don't seem to be garbage collected even when all their threads have
          finished.

          Solution: Set the daemon property on the ThreadGroups. This way a
          ThreadGroup will be destroyed when its last running thread (or thread
          group) is destroyed.

          3) The Java Heap Analysis Tool reported that a huge number of
          ContextManager and ContextService objects were kept in the
          heap. The objects were not accessible from the root set, but they
          were not garbage collected, probably because of some cyclic
          references which the gc cannot handle.

          Solution: Break the reference cycle by nulling out the references to
          the lists of ContextManagers when ContextService objects are stopped.

          The attached patch seems to fix the problems mentioned above. Derbyall
          runs fine, and top reports that the memory usage does not increase
          when running a loop where the driver is loaded (with
          Class.forName("org...EmbeddedDriver").newInstance()) and unloaded
          (with DriverManager.getConnection("jdbc:derby:;shutdown=true")). That
          is, the memory usage of course increased during the first iteration,
          but not during the approximately 8.5 million next ones.

          I will submit tests later.

          Show
          Knut Anders Hatlen added a comment - The memory leak comes from three sources: 1) The AntiGC thread is sometimes not terminated. This happens when Derby is shut down right after the driver is loaded, because the run() method in AntiGC starts with setting a boolean variable which is used to decide when the thread should stop. The same variable is set when a request to shut down Derby is sent, and if the shutdown is requested before the AntiGC thread has started running, the request to shut down Derby might not be noticed by AntiGC which will run forever. Solution: Don't set the boolean variable in AntiGC's run() method. It is enough that it is being set when the object is initialized. 2) Every time the driver is loaded, a new ThreadGroup is created. They don't seem to be garbage collected even when all their threads have finished. Solution: Set the daemon property on the ThreadGroups. This way a ThreadGroup will be destroyed when its last running thread (or thread group) is destroyed. 3) The Java Heap Analysis Tool reported that a huge number of ContextManager and ContextService objects were kept in the heap. The objects were not accessible from the root set, but they were not garbage collected, probably because of some cyclic references which the gc cannot handle. Solution: Break the reference cycle by nulling out the references to the lists of ContextManagers when ContextService objects are stopped. The attached patch seems to fix the problems mentioned above. Derbyall runs fine, and top reports that the memory usage does not increase when running a loop where the driver is loaded (with Class.forName("org...EmbeddedDriver").newInstance()) and unloaded (with DriverManager.getConnection("jdbc:derby:;shutdown=true")). That is, the memory usage of course increased during the first iteration, but not during the approximately 8.5 million next ones. I will submit tests later.
          Hide
          Daniel John Debrunner added a comment -

          Patch looks good to me, the explanation was clear and very useful.

          Show
          Daniel John Debrunner added a comment - Patch looks good to me, the explanation was clear and very useful.
          Hide
          Satheesh Bandaram added a comment -

          Good fix. Wonder if adding the explanations in the code, where possible, would be more useful than leaving them in Jira entry.

          Bernt expressed interest in commiting the change, so I am not commiting this to SVN.

          Show
          Satheesh Bandaram added a comment - Good fix. Wonder if adding the explanations in the code, where possible, would be more useful than leaving them in Jira entry. Bernt expressed interest in commiting the change, so I am not commiting this to SVN.
          Hide
          Knut Anders Hatlen added a comment -

          Attached patch with comments where appropriate.

          Show
          Knut Anders Hatlen added a comment - Attached patch with comments where appropriate.
          Hide
          Knut Anders Hatlen added a comment -

          Fixed in revision 290644.

          Show
          Knut Anders Hatlen added a comment - Fixed in revision 290644.
          Hide
          Knut Anders Hatlen added a comment -

          As I have mentioned on derby-dev earlier, there is a problem with the
          current fix. After the fix was committed, store/encryptDatabase.sql
          sometimes failed because one or more rawStoreDaemon threads printed a
          NullPointerException to System.err.

          For an example, see:
          http://www.multinet.no/~solberg/public/Apache/TinderBox_Derby/testlog/SunOS-5.10_i86pc-i386/290646-encryptionCFB_diff.txt

          The reason for this is that the fix nulls out the reference from the
          ContextService to the list of ContextManagers when the ContextService
          is stopped. Some threads with references to the ContextService might
          still be alive when the ContextService is stopped, and they get
          NullPointerExceptions when accessing the stopped ContextService.

          I have only been able to reproduce this behaviour with the
          store/encryptDatabase.sql test on multi-cpu machines running with a
          heavy background load. It seems like the threads that get
          NullPointerExceptions are orphan threads (like the ones described in
          DERBY-594) that have not yet been able to start because of the heavy
          background load. The problem might go away when DERBY-594 is resolved,
          but I think it needs further investigation.

          Show
          Knut Anders Hatlen added a comment - As I have mentioned on derby-dev earlier, there is a problem with the current fix. After the fix was committed, store/encryptDatabase.sql sometimes failed because one or more rawStoreDaemon threads printed a NullPointerException to System.err. For an example, see: http://www.multinet.no/~solberg/public/Apache/TinderBox_Derby/testlog/SunOS-5.10_i86pc-i386/290646-encryptionCFB_diff.txt The reason for this is that the fix nulls out the reference from the ContextService to the list of ContextManagers when the ContextService is stopped. Some threads with references to the ContextService might still be alive when the ContextService is stopped, and they get NullPointerExceptions when accessing the stopped ContextService. I have only been able to reproduce this behaviour with the store/encryptDatabase.sql test on multi-cpu machines running with a heavy background load. It seems like the threads that get NullPointerExceptions are orphan threads (like the ones described in DERBY-594 ) that have not yet been able to start because of the heavy background load. The problem might go away when DERBY-594 is resolved, but I think it needs further investigation.
          Hide
          Knut Anders Hatlen added a comment -

          I have attached a patch (DERBY-23-npe.diff) that addresses the
          NullPointerException messages from rawStoreDaemon threads we have seen
          in derbyall on some occasions.

          The NullPointerExceptions are thrown because
          ContextService.threadContextList is set to null in
          ContextService.stop(), but some of the daemon threads might still try
          to access the variable. The patch adds checks for threadContextList
          being null, and returns null or false in those cases instead of
          failing with a NullPointerException.

          Five tests in derbyall failed when this patch was applied, but that
          seems to be what one should expect these days. All of the failures are
          also seen in the daily regression tests.

          Feel free to review and comment on the patch. Thanks.

          Show
          Knut Anders Hatlen added a comment - I have attached a patch ( DERBY-23 -npe.diff) that addresses the NullPointerException messages from rawStoreDaemon threads we have seen in derbyall on some occasions. The NullPointerExceptions are thrown because ContextService.threadContextList is set to null in ContextService.stop(), but some of the daemon threads might still try to access the variable. The patch adds checks for threadContextList being null, and returns null or false in those cases instead of failing with a NullPointerException. Five tests in derbyall failed when this patch was applied, but that seems to be what one should expect these days. All of the failures are also seen in the daily regression tests. Feel free to review and comment on the patch. Thanks.
          Hide
          Knut Anders Hatlen added a comment -

          I'm marking this issue as resolved.

          The reason why I haven't resolved it before, is that I still see OutOfMemoryError on some platforms. These errors are caused by the background threads running with a lower priority than the the main thread, and therefore many of the background threads don't get enough CPU time to shut down. If the main thread takes a break now and then, I don't see OutOfMemoryError on any platform. That is,

          while (true)

          { Class.forName("org.apache.derby.jdbc.EmbeddedDriver").newInstance(); DriverManager.getConnection("jdbc:derby:;shutdown=true"); Thread.sleep(1); }

          runs reliably with no memory leak on all platforms I have tested. I don't want to add a regression test for this issue, since I fear it will be timing dependent and might cause noise in the nightly testing.

          I think the issue can be closed because the remaining problem is unlikely to be seen in a real-world application (who will spend all their CPU time on loading and unloading the driver?). If it turns out to be a problem, I see these possible solutions:

          1) On shutdown, wait for all background threads to terminate.
          2) On shutdown, raise the priority of the background threads.

          If someone feels this should be addressed, please open a new issue.

          Show
          Knut Anders Hatlen added a comment - I'm marking this issue as resolved. The reason why I haven't resolved it before, is that I still see OutOfMemoryError on some platforms. These errors are caused by the background threads running with a lower priority than the the main thread, and therefore many of the background threads don't get enough CPU time to shut down. If the main thread takes a break now and then, I don't see OutOfMemoryError on any platform. That is, while (true) { Class.forName("org.apache.derby.jdbc.EmbeddedDriver").newInstance(); DriverManager.getConnection("jdbc:derby:;shutdown=true"); Thread.sleep(1); } runs reliably with no memory leak on all platforms I have tested. I don't want to add a regression test for this issue, since I fear it will be timing dependent and might cause noise in the nightly testing. I think the issue can be closed because the remaining problem is unlikely to be seen in a real-world application (who will spend all their CPU time on loading and unloading the driver?). If it turns out to be a problem, I see these possible solutions: 1) On shutdown, wait for all background threads to terminate. 2) On shutdown, raise the priority of the background threads. If someone feels this should be addressed, please open a new issue.
          Hide
          Andrew McIntyre added a comment -

          This issue has been resolved for over a year with no further movement. Closing.

          Show
          Andrew McIntyre added a comment - This issue has been resolved for over a year with no further movement. Closing.

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Tulika Agrawal
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development