|
Kathey Marsden made changes - 06/Feb/09 01:00 AM
Marking this as a regression since the problem is 10 times as bad on trunk.
Kathey Marsden made changes - 06/Feb/09 01:02 AM
From ClobGrowth:
String updateString = repeatChar("a", 33000); // Anything over 32KB Does this mean that you don't see the issue if the Clob is less than 32 KB, which happens to be the largest page size Derby supports? Yes, I failed to mention that if the clob is less than 32K there is no abnormal growth.
In an effort to understand why the growth is worse on trunk than 10.4, I synched back to rev 635491 (the revision at which 10.4 branched). At that trunk revision the problem was still much worse than on the latest of the 10.4 branch. I'm not quite sure what's going on. I will keep looking at it. It is likely user error.
Unmarking regression. The trunk difference that I was seeing was due to different behavior with different jvms. With the IBM 1.6 jvm the growth is much worse.
java version "1.6.0" Java(TM) SE Runtime Environment (build pwi3260sr3-20081106_07(SR3)) IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Windows XP x86-32 jvmwi3260-20081105_25433 (JIT enabled, AOT enabled) J9VM - 20081105_025433_lHdSMr JIT - r9_20081031_1330 GC - 20081027_AB) JCL - 20081106_01 than with the Sun 1.6 jvm java version "1.6.0_01" Java(TM) SE Runtime Environment (build 1.6.0_01-b06) Java HotSpot(TM) Client VM (build 1.6.0_01-b06, mixed mode) regardless of codeline.
Kathey Marsden made changes - 06/Feb/09 05:46 PM
in the simplest case a row with a single long clob looks something like
this: main page 1 (access level reads this row): slot 0: long column just as short pointer to list of overflow page ---> page 2 -> page 3 -> ... When an update of a long row happens we basically change slot 0 row's column to point somewhere else but leave the old chain disconnected: slot 0: long column: --->page 100 ->page 101 -> .... We can't mark the old chain free until commit, since we need to guarantee space in case of an abort. But if we lose the post commit for any reason then the chain is lost, we dont have any back pointers. So it is very bad to lose this kind of space reclamation. In the normal deleted head row, we can always re-walk the pages and find the deleted rows again in future. Access is in charge of posting post commit events for page 1, but it is up to raw store layer to handle freeing the overflow pages (here 2 and onward). I believe this work is handles by the ReclaimSpace class in java/engine/org/apache/derby/impl/store/raw/data probably the one commented as "reclaim column chain" The real work I believe is done in ReclaimSpaceHelper.java, same directory. My first guess would be that somehow the concurrency is causing the reclaim space to fail, probably on a latch/latch conflict and the code does not handle it. Depending on what the error is we should probably wait and/or retry, or maybe requeue the whole post commit. I suggest some prints to the log in this routine and compare the working vs non-working test cases. It looks like there are some traces you can enable first to see if it gives any good info: if (SanityManager.DEBUG_ON(DaemonService.DaemonTrace)) { SanityManager.DEBUG( DaemonService.DaemonTrace, " aborted " + work + " because container is locked or dropped"); } The following code looks like it could be the problem: if (!container_rlock.lockRecordForWrite( tran, headRecord, false /* not insert */, false /* nowait */)) { // cannot get the row lock, retry tran.abort(); if (work.incrAttempts() < 3) return Serviceable.REQUEUE; else return Serviceable.DONE; } If the above is the problem, I would be interested if either of these fix the problem: 1) change the nowait flag to wait (not a great thing to do in a post commit background thread). 2) get rid of the 3 and see if it eventually succeeds. For this test case and multiprocessor concurrency it may never get in. Thanks Mike for the hints.
with derby.debug.true=DaemonTrace we never reach the "aborted" case and from the log it looks like it is happily trying to do the space reclamation. Attached are the derby logs with and without growth with the property set. There is no appreciable difference to my untrained eye. I'll keep digging.
Kathey Marsden made changes - 06/Feb/09 07:39 PM
The offending code is similar to the code Mike mentioned, but a little later:
// Reclaiming a long column chain due to update. The long column // chain being reclaimed is the before image of the update // operation. // long headPageId = ((PageKey)headRecord.getPageId()).getPageNumber(); StoredPage headRowPage = (StoredPage)containerHdl.getPageNoWait(headPageId); if (headRowPage == null) { // Cannot get page no wait, try again later. tran.abort(); if (work.incrAttempts() < 3) return Serviceable.REQUEUE; else return Serviceable.DONE; } If I bump it to work.incrAttempts() < 10000 I see no growth. But my SpaceTable query looks a little weird to me. SELECT * FROM new org.apache.derby.diag.SpaceTable('APP','CLOBTAB') t CONGLOMERATENAME |ISIND&|NUMALLOCATEDPAGES |NUMFREEPAGES |NUMUNFILLEDPAGES |PAGESIZE |ESTIMSPACESAVING ------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------------------------------------------------------------------- CLOBTAB |0 |3 |12 |1 |32768 |393216 SQL090206122912800 |1 |1 |0 |1 |4096 |0 Shouldn't the NUMALLOCATEDPAGES be greater than NUMFREEPAGES + NUMFILLEDPAGES? note - the column is NUMUNFILLEDPAGES pages not NUMFILLED. See below for comments from the diag SpaceTable javadoc about what the fields mean.
<P>The SpaceTable virtual table has the following columns: <UL> <LI>CONGLOMERATENAME varchar(128) - nullable. The name of the conglomerate, which is either the table name or the index name. (Unlike the SYSCONGLOMERATES column of the same name, table ID's do not appear here).</LI> <LI>ISINDEX SMALLINT - not nullable. Is not zero if the conglomerate is an index, 0 otherwise.</LI> <LI>NUMALLOCATEDPAGES bigint - not nullable. The number of pages actively linked into the table. The total number of pages in the file is the sum of NUMALLOCATEDPAGES + NUMFREEPAGES.</LI> <LI>NUMFREEPAGES bigint - not nullable. The number of free pages that belong to the table. When a new page is to be linked into the table the system will move a page from the NUMFREEPAGES list to the NUMALLOCATEDPAGES list. The total number of pages in the file is the sum of NUMALLOCATEDPAGES + NUMFREEPAGES.</LI> <LI>NUMUNFILLEDPAGES bigint - not nullable. The number of unfilled pages that belong to the table. Unfilled pages are allocated pages that are not completely full. Note that the number of unfilled pages is an estimate and is not exact. Running the same query twice can give different results on this column. </LI> <LI>PAGESIZE integer - not nullable. The size of the page in bytes for that conglomerate. </LI> <LI>ESTIMSPACESAVING bigint - not nullable. The estimated space which could possibly be saved by compressing the conglomerate, in bytes.</LI> </UL> i am surprised it is latch contention and not lock contention, but does make sense that it becomes bigger and bigger problem the more concurrent the JVM and multi-processors. In this case I think we should just wait on the latch rather than retry.
Try getPage() instead of getPageNoWait() When this is fixed the release note should mention that the fix can't resolve past lost space, only future updates. Only an
offline full compress of the table is guaranteed to get all the missing space caused by this problem in the past. This diff just adds some more sanity manager logging to the 3 spots where if the retry's fail we will lose space until a full compress table is done.
Mike Matrigali made changes - 06/Feb/09 09:50 PM
Attached is a patch for this issue. Per Mike's suggestion I changed the code to call getPage() instead of getPageNoWait. I kept the null check but got rid of the retries. I am not totally sure that the null check is still needed though.
I included Mike's debug statements for the other cases where space may not be reclaimed after three tries. After this issue is resolved I will log another Jira issue to address those cases. I don't want to try to address them in this patch because I don't have reproducible cases for them. Also included is a regression test. I am running tests now.
Kathey Marsden made changes - 06/Feb/09 11:53 PM
Need a release note to indicate that a full offline compress is needed to reclaim space consumed by this issue. The fix will prevent future growth but won't reclaim old space.
Kathey Marsden made changes - 06/Feb/09 11:58 PM
Kathey Marsden made changes - 07/Feb/09 04:20 PM
Regression tests passed except for DERBY-3993. Please review.
Kathey Marsden made changes - 07/Feb/09 05:39 PM
I applied the patch and ran the test, which succeeded.
When I removed the fix in ReclaimSpaceHelper, the test failed as expected. I have also started the regression tests. Besides from two formatting issues (line 41 and 93 in the diff), the patch looks good to me. I can't say if calling getPage instead of getPageNoWait is okay without looking more into the code (i.e. how / if it affects the flow), but I see that you have gotten confirmation on this already. I also understand that if we get into the situation that causes growth, a sane Derby build will fail iff a specific debug flag is set, otherwise it will forget about the "lost space" and continue to operate as normal. Thanks Kristian for reviewing the patch. Just to clarify, the new debug statements won't cause a sane build with derby.debug.true=DaemonTrace set to fail. It will just cause additional debug statements to print to the derby.log. After this issue is closed I'll file another issue to investigate and fix space reclamation for the other two cases where we give up after 3 tries.
Thanks for correcting me, Kathey.
My test run just completed, and only testStressMulti failed. Attached is the release note. It doesn't fit very well into our template but should get the message across.
Kathey Marsden made changes - 09/Feb/09 05:21 PM
the patch and the test look good to me. I have the following suggestions which are not necessary for committing the fix.
o maybe add the same comment from other places in the code to the new code that replaces the retry. I can't think of any reason we would get a null here so I don't think retry is worth it, and shutting down the system is not going to help, but it is a silent failure of the reclaim code - around line 380: // If code gets here, the space will be lost forever, and // can only be reclaimed by a full offline compress of the // table/index. o i think the test for checkin is fiine. It might be interesting to just see what happens if you bump the number of concurrent threads way up to 100 or so. It may not matter much if the machine you run on is only dual core, but probably worth one run. maybe we can get someone on the list with an 8+ way to run the test with concurrent users bumped up? o i think it would be worth it to file at least jira tasks for investigating the other 2 paths in the code that might leave unreclaimed space. as I said I am surprised the row lock path didn't fire. I think one could make the table open path fire if the test were modified to force table level exclusive locking - hopefully a much less likely scenario for user applications. it might take different statements also.
Mike Matrigali made changes - 09/Feb/09 06:04 PM
attached is an updated patch derby-4050_diff2.txt which
- fixes formatting issues pointed out by Kristian. - Adds recommended debug output suggested by Mike. - Makes it easy to configure the number of threads for ClobReclamationTest. I ran with 100 threads and saw 201 NUMALLOCATEDPAGES which is expected I think for 100 rows. Without the patch it grew to something so scary I aborted the test for fear of running out of space on my laptop. The checked in test will remain at 2 threads. - Changes the test to set derby.debug.true to DaemonThread so we can see any interesting output as it runs.
Kathey Marsden made changes - 09/Feb/09 09:13 PM
I ran the latest patch on a 32 way machine with 100 threads. The test succeeded.
FYI, it took over an hour to complete. I attached the log file as 'derby.log-32w100t.txt', in case anyone wants to have a look.
Kristian Waagan made changes - 10/Feb/09 10:38 AM
Thanks Kristian for running the long test. I don't see anything in the log indicating we are hitting one of the places where we give up on reclamation, but I am kind of surprised the log is so small. Anyway, unless someone objects, I will commit this to trunk later this morning.
Thanks Kathey
ommitted revision 743023 to trunk. I plan to backport this one all the way to 10.1 after the nightlies run cleanly so will leave this open until that task is complete.
Kathey Marsden made changes - 10/Feb/09 05:16 PM
Kathey Marsden made changes - 13/Feb/09 01:01 AM
DERBY-4054 is another issue where long columns may not get reclaimed.
Kathey Marsden made changes - 13/Feb/09 01:01 AM
DERBY-4055 is another issue where long columns may not get reclaimed
Kathey Marsden made changes - 13/Feb/09 01:04 AM
Kathey Marsden made changes - 13/Feb/09 04:41 PM
another space reclamation issue.
Kathey Marsden made changes - 13/Feb/09 04:41 PM
another space reclamation issue.
Kathey Marsden made changes - 13/Feb/09 04:42 PM
Kathey Marsden made changes - 13/Feb/09 04:43 PM
another space reclamation issue.
10.5 rel mgnt work: release note generator choked on missing quotes around href.
Myrna van Lunteren made changes - 18/Mar/09 09:40 PM
Myrna van Lunteren made changes - 04/May/09 06:22 PM
Dag H. Wanvik made changes - 30/Jun/09 04:02 PM
Dag H. Wanvik made changes - 01/Jul/09 01:42 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
[C:/kmarsden/repro/clobgrowth] java ClobGrowth
Iterations to perform = 10000
Derby database created from main thread
New thread started
Derby connection from new thread
DeployThread update:1000
update:1000
DeployThread update:2000
update:2000
DeployThread update:3000
update:3000
DeployThread update:4000
update:4000
DeployThread update:5000
update:5000
DeployThread update:6000
update:6000
DeployThread update:7000
update:7000
DeployThread update:8000
update:8000
DeployThread update:9000
update:9000
SELECT * FROM new org.apache.derby.diag.SpaceTable('APP','CLOBTAB') t
CONGLOMERATENAME
|ISIND&|NUMALLOCATEDPAGES |NUMFREEPAGES |NUMUNFILLEDPAGES |PAGESIZE |ESTIMSPACESAVING
------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------
CLOBTAB
|0 |11181 |2 |1 |32768 |65536
SQL090205165714850
|1 |1 |0 |1 |4096 |0
Normal shutdown
Test complete
on 10.4 and earlier, it still grows , but slower to about 33MB on 10.4
Iterations to perform = 10000
Derby database created from main thread
New thread started
Derby connection from new thread
DeployThread update:1000
update:1000
DeployThread update:2000
update:2000
DeployThread update:3000
update:3000
DeployThread update:4000
update:4000
DeployThread update:5000
update:5000
DeployThread update:6000
update:6000
DeployThread update:7000
update:7000
DeployThread update:8000
update:8000
DeployThread update:9000
update:9000
SELECT * FROM new org.apache.derby.diag.SpaceTable('APP','CLOBTAB') t
CONGLOMERATENAME
|ISIND&|NUMALLOCATEDPAGES |NUMFREEPAGES |NUMUNFILLEDPAGES |PAGESIZE |ESTIMSPACESAVING
------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------
CLOBTAB
|0 |973 |2 |1 |32768 |65536
SQL090205165416880
|1 |1 |0 |1 |4096 |0
Normal shutdown
Test complete
[C:/kmarsden/repro/clobgrowth]