|
[
Permalink
| « Hide
]
Jeff Clary added a comment - 13/Jan/09 02:48 PM
A test program that demonstrates the issue, plus the output from a run. Note that this issue is intermittent, and only show up on my machine perhaps half the time I run the test program.
Jeff Clary made changes - 13/Jan/09 02:48 PM
Reproduced this with trunk (on the second run) on a Intel Core2 Duo with JDK 1.6 (OpenSolaris).
Stack trace with line numbers attached as 'trunk_stacktrace.txt'.
Kristian Waagan made changes - 13/Jan/09 03:49 PM
Kathey Marsden made changes - 13/Jan/09 04:35 PM
It looks like the synchronization in TopService is not used consistently. The moduleInstances variable is a Vector with its own internal synchronization, but the intention of the code seems to be that all accesses to it should be synchronized on the TopService object (this). Most of the places the accesses are synchronized on the TopService, but some (for instance where the exception is thrown) are not.
I made an attempt on cleaning up the synchronization (see the attached patch), and also replaced the Vector with an ArrayList to make it clearer that we shouldn't rely on the internal synchronization in the Vector object. Now I don't see the exception, but instead I see a Java level deadlock between TopService and FileMonitor, so there is clearly some more cleanup needed. I'm attaching the patch for reference anyway in case someone wants to have a look at it.
Knut Anders Hatlen made changes - 14/Jan/09 10:13 AM
Hi,
Any update on this or plans to get a fix into trunk or 10.5 ? I'm also running into this issue - in our application we have a separate derby db for each "user" and the access pattern is: open, do stuff, close. (it's not kept open so a bank of machines can all do it and users can be directed to any of them). I see now that the above fix hits a deadlock so I'll look more into it and play around. Any suggestions from derby developers, I'm just starting w/ derby src code. -Nick problem is in the monitor, marking as services component.
Mike Matrigali made changes - 09/Jun/09 07:13 PM
Knut Anders Hatlen made changes - 26/Jun/09 12:06 PM
Knut Anders Hatlen made changes - 26/Jun/09 12:54 PM
d4018-1a is a partial fix for the problem, and it should be sufficient to fix the ArrayIndexOutOfBoundsException in TopService.inService(). I ran the full repro on two machines on which I could reproduce the problem reliably without the fix, and now I don't see the exception. I don't see the deadlock that I saw with the previous patch either.
This fix factors out two for loops which iterate over moduleInstances into a helper method which synchronizes on the moduleInstances object over the entire loop. This prevents other threads from removing elements from the vector, and therefore the value returned by Vector.size() should still be valid when we call Vector.get(), and no AIOOBE should be thrown. I believe that it is safe, since the calls to size() and get() are already synchronized on the Vector, and the loop doesn't do anything except unsynchronized accessing a field in the object fetched from the Vector. So there should be no new ordering of how the synchronization locks are obtained, which was the problem with the previous patch. There are still a couple of places where there's an unsynchronized window between the checking of the size and the actual retrieval from the Vector. I'll see if I can address those in a follow-up patch.
Knut Anders Hatlen made changes - 26/Jun/09 03:11 PM
Derbyall and suites.All ran cleanly with the patch.
Knut Anders Hatlen made changes - 26/Jun/09 03:41 PM
Committed revision 789264.
Knut Anders Hatlen made changes - 29/Jun/09 08:59 AM
Patch 2a addresses the remaining issue where there's a possibility
that the size of the Vector changes between the calls to size() and elementAt() in bootModule(). I only added synchronization on moduleInstances over the two calls that needed a consistent view of the Vector. Since the synchronization block does not cover the entire for loop, there is still a possibility that elements are added to or removed from the Vector between two retrievals from it. That should be harmless, though, since it should only make the loop skip one of the modules or look at one of the modules twice. If one is skipped, the worst case would be if it was the module we were looking for so that we boot a module unnecessarily, but this case is already handled by the method and it'll just shut down the module when it detects that it has been booted twice. If one module is looked at twice, it'll just conclude twice that it's not the module we're looking for and go on to the next one. (Since modules are always added to the end of the Vector currently, I don't think this will ever happen with the code as it is today.) The patch does not introduce the possibility for any of those situations by the way, they are just as likely without the patch. The repro still runs without hangs or ArrayIndexOutOfBoundsExceptions, and both Derbyall and suites.All ran cleanly.
Knut Anders Hatlen made changes - 29/Jun/09 03:14 PM
Knut Anders Hatlen made changes - 29/Jun/09 03:14 PM
Dag H. Wanvik made changes - 30/Jun/09 04:02 PM
Dag H. Wanvik made changes - 01/Jul/09 01:50 PM
Committed 2a with revision 790218.
Will keep the issue open until the fixes have been back-ported to the 10.5 branch.
Knut Anders Hatlen made changes - 01/Jul/09 03:30 PM
Merged to 10.5 and committed revision 790412.
Knut Anders Hatlen made changes - 01/Jul/09 10:29 PM
Kathey Marsden made changes - 16/Jul/09 09:24 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||