Issue Details (XML | Word | Printable)

Key: DERBY-4018
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Knut Anders Hatlen
Reporter: Jeff Clary
Votes: 1
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Derby

ArrayIndexOutOfBoundsException in TopService.inService under heavy multithreaded use of EmbeddedDriver

Created: 13/Jan/09 02:45 PM   Updated: 16/Jul/09 09:24 PM
Component/s: Services
Affects Version/s: 10.4.2.0
Fix Version/s: 10.5.2.0, 10.6.0.0

Time Tracking:
Not Specified

File Attachments:
  Size
Java Source File Licensed for inclusion in ASF works BeatDerbyToPieces.java 2009-01-13 02:48 PM Jeff Clary 6 kB
File Licensed for inclusion in ASF works d4018-1a.diff 2009-06-26 03:11 PM Knut Anders Hatlen 2 kB
File Licensed for inclusion in ASF works d4018-2a.diff 2009-06-29 03:14 PM Knut Anders Hatlen 2 kB
Text File Licensed for inclusion in ASF works program_output.txt 2009-01-13 02:48 PM Jeff Clary 141 kB
File Licensed for inclusion in ASF works sync.diff 2009-01-14 10:13 AM Knut Anders Hatlen 5 kB
Text File Licensed for inclusion in ASF works trunk_stacktrace.txt 2009-01-13 03:49 PM Kristian Waagan 7 kB
Environment:
Windows XP SP2
Intel Core 2 Duo 3GHz, 3GB RAM
Java JDK 1.6.0_07

Issue & fix info: High Value Fix
Resolution Date: 01/Jul/09 10:29 PM
Labels:


 Description  « Hide
I have created a test program that reproduces this issue. The program pounds on Derby by running jobs on a pool of 10 threads. Each job creates a new Derby database, a schema, a table, and an index. Then it shuts down the database and finishes. The program runs until an error occurs or until 1000 jobs have finished.

The problem is very intermittent, and related I think to multithreaded access of a vector "moduleInstances" in TopService.java. I am seeing the error on my machine about half the time I run the test program. Sometimes it happens after 100 or 200 jobs...but sometimes not until 500 or more, or not at all.

I am using the 10.4.2 derby.jar downloaded from http://db.apache.org/derby/releases/release-10.4.2.0.cgi, without any modifications on my part.

I will attach the test program plus output from a run that failed with the ArrayIndexOutOfBoundsException. (The actual exception thrown to the application from Derby is SQLException, but the underlying cause is an ArrayIndexOutOfBoundsException.) BE AWARE that if you run this test program it can use up a GByte or more of disk space in your Java tmp directory.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Jeff Clary added a comment - 13/Jan/09 02:48 PM
A test program that demonstrates the issue, plus the output from a run. Note that this issue is intermittent, and only show up on my machine perhaps half the time I run the test program.

Jeff Clary made changes - 13/Jan/09 02:48 PM
Field Original Value New Value
Attachment program_output.txt [ 12397785 ]
Attachment BeatDerbyToPieces.java [ 12397784 ]
Kristian Waagan added a comment - 13/Jan/09 03:49 PM
Reproduced this with trunk (on the second run) on a Intel Core2 Duo with JDK 1.6 (OpenSolaris).

Stack trace with line numbers attached as 'trunk_stacktrace.txt'.

Kristian Waagan made changes - 13/Jan/09 03:49 PM
Attachment trunk_stacktrace.txt [ 12397794 ]
Kathey Marsden made changes - 13/Jan/09 04:35 PM
Derby Categories [High Value Fix]
Knut Anders Hatlen added a comment - 14/Jan/09 10:13 AM
It looks like the synchronization in TopService is not used consistently. The moduleInstances variable is a Vector with its own internal synchronization, but the intention of the code seems to be that all accesses to it should be synchronized on the TopService object (this). Most of the places the accesses are synchronized on the TopService, but some (for instance where the exception is thrown) are not.

I made an attempt on cleaning up the synchronization (see the attached patch), and also replaced the Vector with an ArrayList to make it clearer that we shouldn't rely on the internal synchronization in the Vector object. Now I don't see the exception, but instead I see a Java level deadlock between TopService and FileMonitor, so there is clearly some more cleanup needed. I'm attaching the patch for reference anyway in case someone wants to have a look at it.

Knut Anders Hatlen made changes - 14/Jan/09 10:13 AM
Attachment sync.diff [ 12397878 ]
Nick Puz added a comment - 05/Mar/09 07:41 PM - edited
Hi,
Any update on this or plans to get a fix into trunk or 10.5 ? I'm also running into this issue - in our application we have a separate derby db for each "user" and the access pattern is: open, do stuff, close. (it's not kept open so a bank of machines can all do it and users can be directed to any of them).

I see now that the above fix hits a deadlock so I'll look more into it and play around. Any suggestions from derby developers, I'm just starting w/ derby src code.

-Nick

Mike Matrigali added a comment - 09/Jun/09 07:13 PM
problem is in the monitor, marking as services component.

Mike Matrigali made changes - 09/Jun/09 07:13 PM
Component/s Services [ 11415 ]
Knut Anders Hatlen made changes - 26/Jun/09 12:06 PM
Assignee Knut Anders Hatlen [ knutanders ]
Knut Anders Hatlen made changes - 26/Jun/09 12:54 PM
Status Open [ 1 ] In Progress [ 3 ]
Knut Anders Hatlen added a comment - 26/Jun/09 03:11 PM
d4018-1a is a partial fix for the problem, and it should be sufficient to fix the ArrayIndexOutOfBoundsException in TopService.inService(). I ran the full repro on two machines on which I could reproduce the problem reliably without the fix, and now I don't see the exception. I don't see the deadlock that I saw with the previous patch either.

This fix factors out two for loops which iterate over moduleInstances into a helper method which synchronizes on the moduleInstances object over the entire loop. This prevents other threads from removing elements from the vector, and therefore the value returned by Vector.size() should still be valid when we call Vector.get(), and no AIOOBE should be thrown. I believe that it is safe, since the calls to size() and get() are already synchronized on the Vector, and the loop doesn't do anything except unsynchronized accessing a field in the object fetched from the Vector. So there should be no new ordering of how the synchronization locks are obtained, which was the problem with the previous patch.

There are still a couple of places where there's an unsynchronized window between the checking of the size and the actual retrieval from the Vector. I'll see if I can address those in a follow-up patch.

Knut Anders Hatlen made changes - 26/Jun/09 03:11 PM
Attachment d4018-1a.diff [ 12411930 ]
Knut Anders Hatlen added a comment - 26/Jun/09 03:41 PM
Derbyall and suites.All ran cleanly with the patch.

Knut Anders Hatlen made changes - 26/Jun/09 03:41 PM
Derby Info [Patch Available]
Repository Revision Date User Message
ASF #789264 Mon Jun 29 08:58:33 UTC 2009 kahatlen DERBY-4018: ArrayIndexOutOfBoundsException in TopService.inService under heavy multithreaded use of EmbeddedDriver

Factored out two for loops and surrounded them with synchronization on
the vector they were iterating over. This prevent others from changing
the size of the vector between the calls to Vector.size() and
Vector.get().
Files Changed
MODIFY /db/derby/code/trunk/java/engine/org/apache/derby/impl/services/monitor/TopService.java

Knut Anders Hatlen added a comment - 29/Jun/09 08:59 AM
Committed revision 789264.

Knut Anders Hatlen made changes - 29/Jun/09 08:59 AM
Derby Info [Patch Available]
Knut Anders Hatlen added a comment - 29/Jun/09 03:14 PM
Patch 2a addresses the remaining issue where there's a possibility
that the size of the Vector changes between the calls to size() and
elementAt() in bootModule().

I only added synchronization on moduleInstances over the two calls
that needed a consistent view of the Vector. Since the synchronization
block does not cover the entire for loop, there is still a possibility
that elements are added to or removed from the Vector between two
retrievals from it. That should be harmless, though, since it should
only make the loop skip one of the modules or look at one of the
modules twice.

If one is skipped, the worst case would be if it was the module we
were looking for so that we boot a module unnecessarily, but this case
is already handled by the method and it'll just shut down the module
when it detects that it has been booted twice.

If one module is looked at twice, it'll just conclude twice that it's
not the module we're looking for and go on to the next one. (Since
modules are always added to the end of the Vector currently, I don't
think this will ever happen with the code as it is today.)

The patch does not introduce the possibility for any of those
situations by the way, they are just as likely without the patch.

The repro still runs without hangs or ArrayIndexOutOfBoundsExceptions,
and both Derbyall and suites.All ran cleanly.

Knut Anders Hatlen made changes - 29/Jun/09 03:14 PM
Attachment d4018-2a.diff [ 12412075 ]
Knut Anders Hatlen made changes - 29/Jun/09 03:14 PM
Derby Info [Patch Available]
Dag H. Wanvik made changes - 30/Jun/09 04:02 PM
Issue & fix info [Patch Available] [High Value Fix]
Dag H. Wanvik made changes - 01/Jul/09 01:50 PM
Issue & fix info [High Value Fix] [High Value Fix, Patch Available]
Repository Revision Date User Message
ASF #790218 Wed Jul 01 15:28:13 UTC 2009 kahatlen DERBY-4018: ArrayIndexOutOfBoundsException in TopService.inService under heavy multithreaded use of EmbeddedDriver

Closed another unsynchronized window between Vector.size() and
Vector.elementAt().
Files Changed
MODIFY /db/derby/code/trunk/java/engine/org/apache/derby/impl/services/monitor/TopService.java

Knut Anders Hatlen added a comment - 01/Jul/09 03:30 PM
Committed 2a with revision 790218.

Will keep the issue open until the fixes have been back-ported to the 10.5 branch.

Knut Anders Hatlen made changes - 01/Jul/09 03:30 PM
Fix Version/s 10.6.0.0 [ 12313727 ]
Issue & fix info [Patch Available, High Value Fix] [High Value Fix]
Repository Revision Date User Message
ASF #790412 Wed Jul 01 22:28:18 UTC 2009 kahatlen DERBY-4018: ArrayIndexOutOfBoundsException in TopService.inService under heavy multithreaded use of EmbeddedDriver

Merged fix from trunk (revisions 789264 and 790218).
Files Changed
MODIFY /db/derby/code/branches/10.5
MODIFY /db/derby/code/branches/10.5/java/engine/org/apache/derby/impl/services/monitor/TopService.java

Knut Anders Hatlen added a comment - 01/Jul/09 10:29 PM
Merged to 10.5 and committed revision 790412.

Knut Anders Hatlen made changes - 01/Jul/09 10:29 PM
Status In Progress [ 3 ] Resolved [ 5 ]
Fix Version/s 10.5.1.2 [ 12313870 ]
Resolution Fixed [ 1 ]
Kathey Marsden made changes - 16/Jul/09 09:24 PM
Fix Version/s 10.5.2.0 [ 12314116 ]
Fix Version/s 10.5.1.2 [ 12313870 ]