Issue Details (XML | Word | Printable)

Key: DERBY-2176
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: John H. Embretsen
Reporter: John H. Embretsen
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Derby

OutOfMemoryError in Network Server after running DOTS-like test for weeks

Created: 14/Dec/06 09:13 AM   Updated: 09/Feb/07 11:36 AM
Return to search
Component/s: Network Server
Affects Version/s: 10.2.1.6
Fix Version/s: 10.2.2.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works derby.log 2006-12-14 01:59 PM John H. Embretsen 2 kB
Java Source File Licensed for inclusion in ASF works Derby2176repro.java 2006-12-22 10:52 AM John H. Embretsen 9 kB
Text File Licensed for inclusion in ASF works jmap-histo_server_17Nov1020CET.txt 2006-12-14 01:59 PM John H. Embretsen 42 kB
Image Attachments:

1. OU_10.2.1.6_vs_10.2.2.0.png
(8 kB)

2. OU_jstat-gc-t.png
(5 kB)

3. OU_jstat-gc-t.png
(7 kB)

4. OU_jstat-gc-t_10.2.2.0_server.png
(5 kB)
Environment: Solaris 10 x86_64 MU2, remote client-server, 2 x 2.4 GHz AMD Opteron, 2048 MB RAM, Sun HotSpot 1.5.0_07 server JVM
Issue Links:
Reference
 

Urgency: Normal
Resolution Date: 09/Feb/07 11:36 AM


 Description  « Hide
Running a modified version of the Database Opensource Test Suite (DOTS) (v1.1.1) test case ATCJ2 (auction "simulation"), available from http://sourceforge.net/project/showfiles.php?group_id=3382, resulted in an "OutOfMemoryError: Java heap space" after 57 days, 13 hours and 21 minutes.

The test run has previously been described on derby-dev: http://www.nabble.com/Long-running-testing-of-10.2.1.5-t2382591.html.
The modifications to the original test case were mainly to avoid the type of errors described in http://www.nabble.com/OutOfMemoryErrors-when-testing-Derby-with-DOTS-t1010027.html. Indeed, this is a different, yet similar error.

Heap size (both client and server JVMs) was:
  -Xms128m
  -Xmx128m

The test was run against the 10.2.1.5 release candidate, but since this version does not exist in Jira, and the delta between 10.2.1.5 and the released 10.2.1.6 is minimal, 10.2.1.6 is marked as the "affects version".


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
John H. Embretsen added a comment - 14/Dec/06 01:59 PM
Attached the following files, to help debugging efforts:

   * derby.log
   * OU_jstat-gc-t.png
       - graph of tenured space memory usage (error occurred at about 1381 hrs (x-axis unit))
   * jmap-histo_server_17Nov1020CET.txt
       - textual histogram of objects in the heap approx. 6 days before the error

I plan to make more logs and diagnostics info (including a heap dump) available soon.

Timing of events:
   * Network Server started: 2006-09-26 14:20 CEST
   * Test client started: 2006-09-26 14:23 CEST (+3 min)
   * First OutOfMemoryError occured: 2006-11-23 01:41 CET (+1381 hrs (57 days, 13 hrs), 21 min)
   * Test + server stopped: ca. 2006-12-12 15:05 CET (+1849 hrs, 28 min)


John H. Embretsen added a comment - 14/Dec/06 03:39 PM
Attached new and correct OU_jstat-gc-t.png (previous was from a different test).

Knut Anders Hatlen added a comment - 14/Dec/06 03:45 PM
I'm just guessing here, but the histogram of objects seems similar to what I would expect from DERBY-912, which is a tiny memory leak that did not get fixed in 10.2.1.6. (Should be fixed in 10.2.2.0 though.) That bug caused a vector to grow by one for each sort operation performed on a connection, and the space would not be freed until the connection was closed. So if the DOTS test reuses connections and executes a query which performs a sort every now and then, it will very slowly run out of memory. The symptoms fit because the object histogram shows that most of the heap space is used for Object[] (because the vector has grown) and that there are instances of SortResultSet.

John H. Embretsen added a comment - 14/Dec/06 04:44 PM
Thank you Knut Anders, that is an interesting theory! And if it is correct, it seems like the problem has already been fixed... I started a new test run using 10.2.2.0, so we'll see in a couple of months ;)

I see from derby.log that "Failed Statement is: SELECT BIDERID, BID_PRICE, BID_TIME FROM BID WHERE ITEMID = ? ORDER BY BID_TIME DESC", and I was not able (from a quick code scan) to find any other statements utilizing sorting functionality, so this may be the sinner. There is about 20% chance that the test executes this statement per iteration, so I guess this amounts to a few million ORDER BYs in the course of these 57+ days.

John H. Embretsen added a comment - 19/Dec/06 09:52 AM
Logs, diagnostics info, graphs and (zipped) heap dump taken at the point where the OutOfMemoryError occurred is available at http://dbtg.thresher.com/derby/test/debug/DERBY-2176/.

John H. Embretsen added a comment - 22/Dec/06 10:52 AM
Attached a graph (OU_10.2.1.6_vs_10.2.2.0.png) showing a rough comparison of 10.2.1.6 and 10.2.2.0 when it comes to memory usage (tenured space) when an ORDER BY query is executed multiple times.

I created a smaller (and faster) test case in which a similar ORDER BY query is executed repetitively (using a PreparedStatement). The attached graph shows results from this test running 1.2 million ORDER BY queries. The test case is also attached (Derby2176repro.java).

10.2.1.6 memory usage looks very similar to what was seen during the DOTS-based long running test for 10.2.1.5, while 10.2.2.0 memory usage is completely stable and much lower.

Based on these results I now feel quite confident that this issue is the same as the one reported and fixed in DERBY-912, but I will wait a while longer (and do more testing) before I resolve/close this issue.

John H. Embretsen added a comment - 09/Feb/07 11:30 AM
Attaching 'OU_jstat-gc-t_10.2.2.0_server.png', which is a graph showing utilization of the tenured space in the heap of the server VM during a long running DOTS run against 10.2.2.0.

The test has now been running for about 58 days (the 10.2.1.5 test ran for 57 days, 13 hours before the OOME), and is showing no signs of ever-increasing memory usage. Memory usage in this test has never been more stable.

John H. Embretsen added a comment - 09/Feb/07 11:36 AM
This issue was caused by DERBY-912, fixed in SVN revs 467578 (trunk) and 467580 (10.2 branch). Fix is present in 10.2.2.0, verified by re-running the test for (at least) 58 days.