|
[
Permlink
| « Hide
]
Knut Anders Hatlen added a comment - 17/Sep/05 07:45 AM
When I looked into this issue, I found a problem with the synchronization. If Derby is shut down right after the driver is loaded, the AntiGC thread will continue to run in most cases. Fixing this problem made the heap grow slower, but it was not enough to stop the memory leak entirely.
The memory leak comes from three sources:
1) The AntiGC thread is sometimes not terminated. This happens when Derby is shut down right after the driver is loaded, because the run() method in AntiGC starts with setting a boolean variable which is used to decide when the thread should stop. The same variable is set when a request to shut down Derby is sent, and if the shutdown is requested before the AntiGC thread has started running, the request to shut down Derby might not be noticed by AntiGC which will run forever. Solution: Don't set the boolean variable in AntiGC's run() method. It is enough that it is being set when the object is initialized. 2) Every time the driver is loaded, a new ThreadGroup is created. They don't seem to be garbage collected even when all their threads have finished. Solution: Set the daemon property on the ThreadGroups. This way a ThreadGroup will be destroyed when its last running thread (or thread group) is destroyed. 3) The Java Heap Analysis Tool reported that a huge number of ContextManager and ContextService objects were kept in the heap. The objects were not accessible from the root set, but they were not garbage collected, probably because of some cyclic references which the gc cannot handle. Solution: Break the reference cycle by nulling out the references to the lists of ContextManagers when ContextService objects are stopped. The attached patch seems to fix the problems mentioned above. Derbyall runs fine, and top reports that the memory usage does not increase when running a loop where the driver is loaded (with Class.forName("org...EmbeddedDriver").newInstance()) and unloaded (with DriverManager.getConnection("jdbc:derby:;shutdown=true")). That is, the memory usage of course increased during the first iteration, but not during the approximately 8.5 million next ones. I will submit tests later. Patch looks good to me, the explanation was clear and very useful.
Good fix. Wonder if adding the explanations in the code, where possible, would be more useful than leaving them in Jira entry.
Bernt expressed interest in commiting the change, so I am not commiting this to SVN. Attached patch with comments where appropriate.
Fixed in revision 290644.
As I have mentioned on derby-dev earlier, there is a problem with the
current fix. After the fix was committed, store/encryptDatabase.sql sometimes failed because one or more rawStoreDaemon threads printed a NullPointerException to System.err. For an example, see: http://www.multinet.no/~solberg/public/Apache/TinderBox_Derby/testlog/SunOS-5.10_i86pc-i386/290646-encryptionCFB_diff.txt The reason for this is that the fix nulls out the reference from the ContextService to the list of ContextManagers when the ContextService is stopped. Some threads with references to the ContextService might still be alive when the ContextService is stopped, and they get NullPointerExceptions when accessing the stopped ContextService. I have only been able to reproduce this behaviour with the store/encryptDatabase.sql test on multi-cpu machines running with a heavy background load. It seems like the threads that get NullPointerExceptions are orphan threads (like the ones described in DERBY-594) that have not yet been able to start because of the heavy background load. The problem might go away when DERBY-594 is resolved, but I think it needs further investigation. I have attached a patch (
NullPointerException messages from rawStoreDaemon threads we have seen in derbyall on some occasions. The NullPointerExceptions are thrown because ContextService.threadContextList is set to null in ContextService.stop(), but some of the daemon threads might still try to access the variable. The patch adds checks for threadContextList being null, and returns null or false in those cases instead of failing with a NullPointerException. Five tests in derbyall failed when this patch was applied, but that seems to be what one should expect these days. All of the failures are also seen in the daily regression tests. Feel free to review and comment on the patch. Thanks. I'm marking this issue as resolved.
The reason why I haven't resolved it before, is that I still see OutOfMemoryError on some platforms. These errors are caused by the background threads running with a lower priority than the the main thread, and therefore many of the background threads don't get enough CPU time to shut down. If the main thread takes a break now and then, I don't see OutOfMemoryError on any platform. That is, while (true) { Class.forName("org.apache.derby.jdbc.EmbeddedDriver").newInstance(); DriverManager.getConnection("jdbc:derby:;shutdown=true"); Thread.sleep(1); } runs reliably with no memory leak on all platforms I have tested. I don't want to add a regression test for this issue, since I fear it will be timing dependent and might cause noise in the nightly testing. I think the issue can be closed because the remaining problem is unlikely to be seen in a real-world application (who will spend all their CPU time on loading and unloading the driver?). If it turns out to be a problem, I see these possible solutions: 1) On shutdown, wait for all background threads to terminate. 2) On shutdown, raise the priority of the background threads. If someone feels this should be addressed, please open a new issue. This issue has been resolved for over a year with no further movement. Closing.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||