|
[
Permlink
| « Hide
]
Rajesh Kartha added a comment - 10/Oct/06 03:45 AM
Attaching the files I mentioned in the description of this issue.
Rajesh Kartha made changes - 10/Oct/06 03:45 AM
Dag H. Wanvik made changes - 23/Mar/07 04:44 PM
My analysis of this problem:
* When EmbedConnection#close is called from testEmbedAndClient.java:50, the call fails to clean up the connection properly, since the call to the private EmbedConnection#close(StandardException) never happens. This is due to the fact that the public close method first checks if the connection is already closed via a call to EmbedConnection#isClosed(). If so, close just returns, EmbedConnection#isClosed() returns true even though the connection is still marked as active, because the underlying database has been shut down. This fact makes TransactionResourceImpl#isActive() return false. Now, the connection objects and the TransactionResourceImpl will eventually be garbage collected, but the ContextManager object owned by TransactionResourceImpl, will not, leading to the memory leak. This is because the ContextManager object is referenced from the HashSet 'allContexts' of the ContextService. Normally, the context manager object is removed from 'allContexts' when the EmbedConnection#close(exceptionclose) calls TransactionResourceImpl#cleanupOnError. However, it does not happen here since isClosed returns true, so EmbedConnection#close() never gets to call EmbedConnection#close(exceptionclose). Modifying the repro to never close the connection object, I also see the leak, since EmbedConnection#finalize similarly fails to call EmbedConnection#close(exceptionclose) when isClosed() returns true. More detail (feel free to skip ;) TransactionResourceImpl#cleanupOnError, in turn, calls ContextManager#cleanupOnError which, when popping its stack of contexts finally calls the stack's bottom SystemContext#cleanupOnError, which again makes this call on line 62: : getContextManager().owningCsf.removeContext(getContextManager()); : which releases the context manager from 'allContexts' so it can be garbage collected. * I have made a patch which makes the repro (and the modified repro I mentioned) work without running out of space: It modifies EmbedConnection#close to clean up in the case described above. It also adds a finalizer method to TransactionResourceImpl which calls ContextManager#cleanupOnError with an argument of StandardException.closeException(), if not done already. This catches the case when no close is called on the connection. This change would make the leak go away on its own for both repro cases, but I added the change to EmbedConnection#close also, since it purports to release database resources in its Javadoc. So, when the connection is closed or garbage collected, the ContextManager will be cleaned up as well, and the leak is plugged. suites.All and derbyall ran without incident on Sun JDK 1.6, Solaris 10/x86. * There is a related issue, produced a patch. They overlap, in that that both patches makes a changes to EmbedConnection#close, so we need to reconcile them for that method.
Dag H. Wanvik made changes - 24/Mar/07 04:32 PM
Dag H. Wanvik made changes - 24/Mar/07 04:32 PM
Please disregard the paragraph quoted below in my previous comment, the
correct version is given below (the original text reflects my first attempt at the patch :) > It also adds a finalizer method to TransactionResourceImpl which > calls ContextManager#cleanupOnError with an argument of > StandardException.closeException(), if not done already. It also adds a finalizer method to EmbedConnection which calls TransactionResourceImpl#cleanupOnError with an argument of StandardException.closeException(), if not done already. It seems like it would be nice if, instead of adding more code to
EmbedConnection.close(), we could instead just have EmbedConnection.close() call EmbedConnection.close(StandardException), and have all the EmbedConnection closing logic in one place. Right now there seems to be logic in two EmbedConnection.close() methods, and also in the finalize() method. Can it all be moved to the one close(exception) method and have the other two be 1-line wrappers? Thanks for looking at this, Bryan!
Uploading a modified patch which addresses Bryan's comment, although I did not *quite* manage to make it a one-liner ;) I also noticed that the isClosed() method changes the internal state variable 'active', thwarting the logic needed to get the cleanup done, so I removed the call to setInactive() from isClosed(). I believe this is safe. [Alterately, one call do all the cleanup needed from inside isClosed() if the underlying transaction is not active.] I further simplified the finalizer by removing the test for rootConnection since this is done inside close(StandardException) anyway. I verified the patch against the repro again, including the modified one I described for the previous patch. I did not add a new test for this issue. I ran derbyall and suites.All again with the patch against svn 521401 with successfully (modulo the often seen dblook error in derbyall), Solaris 10/x86, Sun JDK1.6.
Dag H. Wanvik made changes - 27/Mar/07 10:00 PM
Hi Dag, the new patch looks very good to me. This sort of close
logic always seems to be a bit finicky, but I think you've got it nicely packaged now.
John H. Embretsen made changes - 28/Mar/07 08:08 AM
I was getting ready to commit the patch for this issue, but then I
updated my trunk and found that the following newly introduced/converted JUnit test makes Derby deadlock when the patch ( ProcedureInTriggerTest It looks like a deadlock in the VM (see also enclosed stacks of involved threads) caused by the patch's use of finalizer code which leads to synchronization, and I am at a loss presently for how to resolve this best. What seems to happen is this: * In ProcedureInTriggerTest:388, a call to Stement#execute is performed: > s.execute("create trigger select_from_trig_table no cascade before " + "delete on t1 for each STATEMENT call proc_reads_sql(1)"); This call synchronizes on the current (root) connection object. * At this time I believe there are still two nested connection objects from previous statements (the test uses triggers and stored procedures called from them), which are no longer referenced but for whom the finalizer has yet to run (not yet garbage collected). * Now, the CreateTriggerConstantAction#executeConstantAction in line 275 uses the Dependency manager to send an invalidate to the trigger table so that DML statements on this table will be recompiled. (see stack of thread "main" in attachment ProcedureInTriggerTest-threadstacks.txt). * As a result of this, GenericPreparedStatement#prepareToInvalidate is called which again calls GenericLanguageConnectionContext#verifyNoOpenResultSets * verifyNoOpenResultSets contains this sequence of calls: // There may be open ResultSet's that are yet to be garbage collected // let's try and force these out rather than throw an error System.gc(); System.runFinalization(); * This last call causes deadlock. This seems to happens because the said nested connection's finalizers are attempted executed. With the patch, the finalizer of EmbedConnection contains a call to EmbedConnection#close which synchronizes on the root connection object (already locked by the "main" thread as part of the execute). The execution of the finalizer happens in a thread forked by the call to System.runFinalization(), which hangs waiting for the finalizer thread to complete (see Finalizer#forkSecondaryFinalizer's call to sft.join() ca line 115 in Sun JDK 1.6). Now, the finalizers are stuck waiting for the lock held my "main", so deadlock ensues. * Now, without the call to System.runFinalization(); the test runs successfully, presumably because the two connection objects will then be garbage collected later, after s.execute() has terminated. Also, removing the call to System.gc() removes the deadlock, but this breaks the test however (verifyNoOpenResultSets throws X0X95.S - LANG_CANT_INVALIDATE_OPEN_RESULT_SET in line 1733): "Operation 'CREATE TRIGGER' cannot be performed on object 'T1' because there is an open ResultSet dependent on that object." So it would seem that calling System.gc() is sufficient, but System.runFinalization is harmful, since it will wait for finalization to finish, which is a dangerous thing if the thread already holds locks. *Or* one could argue that the finalizers should never call anything that could require a lock (too strict in my view). So, now I am trying to understand what would be the correct approach to resolve this. - Without the call to close in the finalizer of EmbedConnection, we have the HashSet 'allContexts' of the ContextService may leak. - With the call to close in the finalizer of EmbedConnection, we risk deadlock as long as any Derby code calls System.runFinalization() anywhere. Grepping, i see this happens in three places in the engine: LowMemory.java:95: System.runFinalization(); GenericLanguageConnectionContext.java:1617: System.runFinalization(); GenericLanguageConnectionContext.java:1709: System.runFinalization(); Possible approaches: 1) Maybe we could change the HashSet 'allContexts' of the ContextService to use weak hash map perhaps, a WeakHashSet, if there is such a thing. That way, the ContextManager objects would be cleaned eventually. 2) Remove the explicit calls to System.runFinalization. It seems that keeping just the System.gc() call is less risky, but may be less efficient, so could break existing apps? The System.gc() is not guaranteed to yield results anyway.. 3) Figure out a way to explicitly close the result sets in question, so the gc calls would be redundant. I am not sure how easy this would be; they are probably there for a good reason. 4) Remove the calls and allow the execution to fail as indicated in the comment of verifyNoOpenResultSets: // There may be open ResultSet's that are yet to be garbage collected // let's try and force these out rather than throw an error System.gc(); System.runFinalization(); This could break existing apps. 5) other ideas..? Finally, can this deadlock be considered a Java run-time error? If not, what is the contract being broken by Derby app code in the above? I tried to search a bit around advisability of explicitly running runFinalization, in the presence of held Java locks, and finalizers which can cause locks, but I didn't find anything conclusive... Any advise is appreciated :)
Dag H. Wanvik made changes - 18/Apr/07 12:04 AM
Hi Dag,
>1) Maybe we could change the HashSet 'allContexts' of the > ContextService to use weak hash map perhaps, a WeakHashSet, if there > is such a thing. That way, the ContextManager objects would be > cleaned eventually. Maybe you could press java.util.WeakHashMap into service. Its javadoc suggests the following issues: a) You would need to choose something innocuous for the values in this map. You're only interested in the keys and garbage-collectible values would need to be wrapped in WeakReferences. b) ContextService.notifyAllActiveThreads() would need to be insulated from disappearing keys. Also, for the record, it looks to me as though the Thread.interrupt() call needs to be wrapped in a privileged action block. A data point: In addition to the patch, I removed both calls to
System.runFinalization() from GenericLanguageConnectionContext.java, but kept the calls to System.gc(). I ran suites.All end derbyall successfully with both Sun JDK1.4.2 and 1.6 (sane jars), so I wonder if this may be the simplest way to resolve this. Does anyone know if/why the calls to System.runFinalization() are (sometimes only, it would seem) necessary? The calls to System.runFinalization() has been there since incubation as far as I can tell. In any case, I will prepare a patch which contains this change as well, maybe then someone can test on other VMs; I wonder whether the behavior could differ in a significant way between VMs here. Uploaded patch
that two calls to System.runFinalization have been commented out, please see previous comment.
Dag H. Wanvik made changes - 18/Apr/07 05:06 PM
More data:
Unfortunately, without the call to System.runFinalization(), ProcedureInTriggerTest fails on Linux with Sun JDKs with LANG_CANT_INVALIDATE_OPEN_RESULT_SET. Keeping the call, it deadlocks, as on Solaris. For the IBM 1.5 VM, keeping the call, there is no deadlock, but the error happens. As soon as the finalizer of the nested EmbedConnection objects (non-root) synchronize, the failure happens, but at least the VM doesn't deadlock... Presumably the failure happens because the finalizers can't complete when gc() and runFinalization() is called, barring the result sets from being garbage collected as well. I'll have to try another tack than removing the call to .. to complete the sentence of the previous comment:
I'll have to try another tack than removing the call to System.runFinalization(). > Presumably the failure happens because the finalizers can't complete
> when gc() and runFinalization() is called, barring the > result sets from being garbage collected as well. It could also be that gc() just makes the garbage collection start, possibly in another thread, so that runFinalization() is called before the objects are marked as unreferenced. IIRC, gc() doesn't block whereas runFinalization() does. Note that the IBM 1.5 VM implemented a much more aggressive gc than previous versions. I remember that many tests had to be rewritten when people started running tests with that VM because ResultSets were closed earlier, etc. That probably explains why you don't see the deadlock in runFinalization() on IBM 1.5, since the objects most likely are gc'ed and finalized before you come to that point. Thanks for looking at this Knut!
> It could also be that gc() just makes the garbage collection start, > possibly in another thread, so that runFinalization() is called before > the objects are marked as unreferenced. IIRC, gc() doesn't block > whereas runFinalization() does. On Solaris, gc() alone manages to clear up the result set objects in time: Not sure how that works; possibly the marking of the nested connection objects makes it possible to also mark the result set objects which are then effectively collected in time for the test in #prepareToInvalidate() even though the connection objects can't yet be finalized (blocked). As you say, since gc() doesn't block, there is no deadlock, and it works. > Note that the IBM 1.5 VM implemented a much more aggressive gc than > previous versions. I remember that many tests had to be rewritten when > people started running tests with that VM because ResultSets were On Linux with SUN VM and with the IBM VM, gc() alone are not able to free the result set objects in time, but this is could be timing dependent. > closed earlier, etc. That probably explains why you don't see the > deadlock in runFinalization() on IBM 1.5, since the objects most > likely are gc'ed and finalized before you come to that point. No, I get the error with the IBM VM too, so the calls are necessary. But they will not succeed in finalizing the nested connection objects (unless they are collected earlier but result set objects are not) but at least it doesn't deadlock. I will upload a new patch which avoids synchronization in the finalizer of nested connection objects, which solves the deadlock problem. This version of the patch,
It removes the call to close() from the finalizer of EmbedConnection for nested connection objects introduced in earlier versions of this patch as it is not strictly necessary. This solves the deadlock issue. suites.All and derbyall + the repro run cleanly with Sun 1.4 on Solaris. I have verified the repro and ProcudureInTriggerTest against Sun 1.4 and IBM 1.5 on Linux as well. Unless I get feedback on this version I will commit it on Monday.
Dag H. Wanvik made changes - 20/Apr/07 01:52 PM
Dag H. Wanvik made changes - 23/Apr/07 10:47 PM
Dag H. Wanvik made changes - 25/Apr/07 12:13 AM
Dag H. Wanvik made changes - 16/Nov/07 03:13 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||