Issue Details (XML | Word | Printable)

Key: DERBY-3347
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Critical Critical
Assignee: Knut Anders Hatlen
Reporter: Bogdan Calmac
Votes: 2
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Derby

ERROR XSDB3: Container information cannot change once written

Created: 23/Jan/08 10:36 PM   Updated: Tuesday 08:16 PM
Return to search
Component/s: Store
Affects Version/s: 10.3.1.4, 10.3.2.1
Fix Version/s: 10.3.3.0, 10.4.1.3, 10.5.1.1

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works d3347-1a.diff 2008-04-10 11:39 AM Knut Anders Hatlen 18 kB
Text File Licensed for inclusion in ASF works d3347-1a.stat 2008-04-10 11:39 AM Knut Anders Hatlen 0.3 kB
Text File Licensed for inclusion in ASF works d3347-1b.diff 2008-04-11 08:54 AM Knut Anders Hatlen 18 kB
Text File Licensed for inclusion in ASF works d3347-2a.diff 2008-04-10 12:53 PM Knut Anders Hatlen 3 kB
Text File Licensed for inclusion in ASF works d3347-preview.diff 2008-04-09 01:38 PM Knut Anders Hatlen 14 kB
Text File Licensed for inclusion in ASF works d3347-preview.diff 2008-04-08 03:51 PM Knut Anders Hatlen 14 kB
Java Source File Licensed for inclusion in ASF works FileOffset.java 2008-04-16 07:23 AM Knut Anders Hatlen 2 kB
Zip Archive Licensed for inclusion in ASF works MiniStress.zip 2008-04-13 01:39 PM Kathey Marsden 3 kB
HTML File Licensed for inclusion in ASF works releaseNote.html 2008-04-29 01:26 PM Knut Anders Hatlen 5 kB
Environment:
Windows 2003 Server
Sun Java 1.6.0_03
Issue Links:
Duplicate
 
Reference
 

Issue & fix info: Release Note Needed
Bug behavior facts: Regression
Resolution Date: 14/Apr/08 10:17 AM


 Description  « Hide
We are using derby as an embedded DB for our data collection server. During an endurance test when we do around 270 inserts and 9 updates per second, for about a week, I ocasionally see the error below in the deby log (and nothing else beside this).

This is a vanilla installation, we run derby embedded with no extra configuration. I can confirm that there is no memory problem, the heap usage seems constant over time.

Can somebody provide some more information regarding the effects of this error? By looking at the stacktrace, it looks like a checkpoint operation is aborted due to some inconsistency in the internal data structure. If the error does not repeat immediately, does it mean that the next checkpoint is successful and there is no data loss?

I can't provide a test case for that, the error happens after about 1-2 day of running our software. I will rerun the test with the debug jars to capture the line numbers in the stacktrace. Also, I'm starting another test with 10.2.2.0, to see if this problem was introduced in the latest version.

There are another two bugs referring to this error, (https://issues.apache.org/jira/browse/DERBY-2284 and https://issues.apache.org/jira/browse/DERBY-3087) but they seem to happen in response to some client action. This use case is a bit different, the client keeps inserting and updating records for several days in a steady manner and at some point the error pops up.


And lastly, here is the exception:


Checkpoint Daemon caught standard exception

------------ BEGIN ERROR STACK -------------

ERROR XSDB3: Container information cannot change once written: was 0, now 80
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.store.raw.data.AllocPage.WriteContainerInfo(Unknown Source)
at org.apache.derby.impl.store.raw.data.FileContainer.writeHeader(Unknown Source)
at org.apache.derby.impl.store.raw.data.RAFContainer.writeRAFHeader(Unknown Source)
at org.apache.derby.impl.store.raw.data.RAFContainer.clean(Unknown Source)
at org.apache.derby.impl.services.cache.CachedItem.clean(Unknown Source)
at org.apache.derby.impl.services.cache.Clock.cleanCache(Unknown Source)
at org.apache.derby.impl.services.cache.Clock.cleanAll(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.checkpoint(Unknown Source)
at org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(Unknown Source)
at org.apache.derby.impl.store.raw.log.LogToFile.checkpoint(Unknown Source)
at org.apache.derby.impl.store.raw.RawStore.checkpoint(Unknown Source)
at org.apache.derby.impl.store.raw.log.LogToFile.performWork(Unknown Source)
at org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(Unknown Source)
at org.apache.derby.impl.services.daemon.BasicDaemon.work(Unknown Source)
at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown Source)
at java.lang.Thread.run(Thread.java:619)


------------ END ERROR STACK -------------



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Bogdan Calmac added a comment - 24/Jan/08 02:39 PM
All right, I was able to reproduce the problem using the debug jars for 10.3.2.1, the error happened 4 times over last night. The 10.2.2.0 test went fine so far, so this might be a problem introduced in 10.3. Another thing I want to mention is that the test runs on a quad core CPU, so this could be a concurrency issue. And now here is the stacktrace with line numbers:

Checkpoint Daemon caught standard exception

------------ BEGIN ERROR STACK -------------

ERROR XSDB3: Container information cannot change once written: was 0, now 80
at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:301)
at org.apache.derby.impl.store.raw.data.AllocPage.WriteContainerInfo(AllocPage.java:575)
at org.apache.derby.impl.store.raw.data.FileContainer.writeHeader(FileContainer.java:900)
at org.apache.derby.impl.store.raw.data.RAFContainer.writeRAFHeader(RAFContainer.java:690)
at org.apache.derby.impl.store.raw.data.RAFContainer.clean(RAFContainer.java:534)
at org.apache.derby.impl.services.cache.CachedItem.clean(CachedItem.java:173)
at org.apache.derby.impl.services.cache.Clock.cleanCache(Clock.java:1381)
at org.apache.derby.impl.services.cache.Clock.cleanAll(Clock.java:619)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.checkpoint(BaseDataFileFactory.java:1202)
at org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(LogToFile.java:1521)
at org.apache.derby.impl.store.raw.log.LogToFile.checkpoint(LogToFile.java:1360)
at org.apache.derby.impl.store.raw.RawStore.checkpoint(RawStore.java:440)
at org.apache.derby.impl.store.raw.log.LogToFile.performWork(LogToFile.java:3411)
at org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(BasicDaemon.java:331)
at org.apache.derby.impl.services.daemon.BasicDaemon.work(BasicDaemon.java:668)
at org.apache.derby.impl.services.daemon.BasicDaemon.run(BasicDaemon.java:394)
at java.lang.Thread.run(Thread.java:619)


------------ END ERROR STACK -------------

Knut Anders Hatlen added a comment - 24/Jan/08 03:36 PM
Just guessing here... The numbers 0 and 80 make me suspect that this is an issue with borrowedSpace. I think those are the only valid values for borrowedSpace. DERBY-3099 made some changes to the space calculation in AllocPage and may be the culprit. It was introduced in 10.3.2.1. Though, DERBY-2284 is reported against 10.3.1.4, so it might be something else that's causing it. Bogdan, is it possible for you to run with Derby 10.3.1.4 and see if the same happens there?

There is also another bug, DERBY-3116, which mentions incorrect space calculation involving borrowedSpace. It has a patch attached. If it's not too much trouble for you, it would be very interesting if you could download the 10.3.2.1 sources, apply the DERBY-3116 patch and see if that fixes your problem.

Bogdan Calmac added a comment - 24/Jan/08 05:20 PM
Sure, I'll go ahead with your suggestions and let you know how it goes.

Bogdan Calmac added a comment - 28/Jan/08 03:31 PM
All right, I've run the endurance test for 10.3.1.4 and 10.3.2.1 + 3116 and the error is still there, numerous times.

This is a pretty serious issue in terms of reliability. Another thing is that after a bunch of these errors the server blows up with JDBC exceptions (probably because the failed checkpoints).

Knut Anders Hatlen added a comment - 29/Jan/08 10:57 AM
Thanks for taking the time to run these experiments, Bogdan!

At least, now we know the problem is not caused by DERBY-3099 or DERBY-3116, and we know that it must have been introduced some time after 10.2.2.0 and before 10.3.1.4.

One thing I know was changed in that area of the code in that period, is that RAFContainer started using java.nio.* to allow parallel reads and writes on JVMs that support nio. It is possible to force the use of the old mechanism by deleting or commenting out the following lines in java/engine/org/apache/derby/modules.properties and rebuilding derby.jar:

# store data using a StorageFactory
# Enhanced version using NIO API; requires Java 1.4
derby.module.rawStore.data.genericJ4=org.apache.derby.impl.store.raw.data.BaseDataFileFactoryJ4
derby.env.jdk.rawStore.data.genericJ4=4
derby.env.classes.rawStore.data.genericJ4=java.nio.Buffer
cloudscape.config.rawStore.data.genericJ4=derby

It would be great if you could make that change and rerun your test.

Knut Anders Hatlen added a comment - 31/Jan/08 11:24 PM
Marking the issue as a regression since it is reported to work on Derby 10.2.2.0.

Bogdan Calmac added a comment - 01/Feb/08 11:03 PM
I've started the test for 10.3.2.1 without NIO and in the first 8 hours I got the exception below. I will have to stop the test tonight and restart it on Monday. I'll keep you posted how it goes.

2008-02-01 20:31:59.903 GMT Thread[derby.rawStoreDaemon,5,derby.daemons] Cleanup action starting
java.lang.NullPointerException
at org.apache.derby.impl.store.access.btree.ControlRow.getControlRowForPage(ControlRow.java:878)
at org.apache.derby.impl.store.access.btree.ControlRow.get(ControlRow.java:844)
at org.apache.derby.impl.store.access.btree.ControlRow.get(ControlRow.java:820)
at org.apache.derby.impl.store.access.btree.BTreePostCommit.purgeRowLevelCommittedDeletes(BTreePostCommit.java:462)
at org.apache.derby.impl.store.access.btree.BTreePostCommit.performWork(BTreePostCommit.java:278)
at org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(BasicDaemon.java:331)
at org.apache.derby.impl.services.daemon.BasicDaemon.work(BasicDaemon.java:668)
at org.apache.derby.impl.services.daemon.BasicDaemon.run(BasicDaemon.java:394)
at java.lang.Thread.run(Thread.java:619)
Cleanup action completed

Bogdan Calmac added a comment - 22/Feb/08 03:21 PM
Since 10.3.2.1 without NIO is not a tested combination as illustrated by the exception above (so I wouldn't use it in productions), I continued testing 10.3.2.1 and 10.2.2.0 side by side.

One interesting thing that I noticed is that the test is successful if the maximum heap memory is small. With only 128M, 10.3.2.1 did not fail for a week. But with 800M (or 512M in the original tests) the error comes up consistently. Could it be related to weak/soft references?

Is there somebody familiar enough with the code to try to look into this at code level? This seems like a pretty important reliability bug to me, so I will try to debug the code, but it will be a while before I become comfortable with the code base.

Dyre Tjeldvoll added a comment - 06/Mar/08 08:44 PM
The call stack in your comment from 2008-02-01 looks like DERBY-3362

Kathey Marsden added a comment - 12/Mar/08 11:00 PM
Bogden, can you try the latest on the 10.3 branch, now that DERBY-3362 has been fixed
and see if you still see the problem?

Bogdan Calmac added a comment - 12/Mar/08 11:13 PM
Sure, I can try it, but DERBY-3362 has probably nothing to do with the original problem. My comment from Feb 1st refers to another issue that prevented me from running the test.

Bogdan Calmac added a comment - 12/Mar/08 11:16 PM
Would it be valuable to still disable NIO at code level or should I use the unmodified 10.3 branch?

Kathey Marsden added a comment - 12/Mar/08 11:19 PM
I think it would be valuable to try with NIO disabled. As I understand it you were blocked from doing that experiment because of DERBY-3362

Bogdan Calmac added a comment - 12/Mar/08 11:36 PM
Yes, that is correct. OK.

Bogdan Calmac added a comment - 24/Mar/08 01:55 PM
I have run my test for a week with the 10.3.2.1 SVN and NIO disabled and the error did not occur. So then we can throw a guess that the issue is related to concurrency in the new code based on NIO.

Kathey Marsden added a comment - 02/Apr/08 08:06 PM
There is another user that may be hitting this same issue. The trace is
ERROR XSDB3: Container information cannot change once written: was 0, now 80
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.store.raw.data.AllocPage.WriteContainerInfo(Unknown Source)
at org.apache.derby.impl.store.raw.data.FileContainer.writeHeader(Unknown Source)
at org.apache.derby.impl.store.raw.data.RAFContainer.writeRAFHeader(Unknown Source)
at org.apache.derby.impl.store.raw.data.RAFContainer.clean(Unknown Source)
at org.apache.derby.impl.services.cache.CachedItem.clean(Unknown Source)
at org.apache.derby.impl.services.cache.Clock.cleanCache(Unknown Source)
at org.apache.derby.impl.services.cache.Clock.cleanAll(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.checkpoint(Unknown Source)
at org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(Unknown Source)
at org.apache.derby.impl.store.raw.log.LogToFile.checkpoint(Unknown Source)
at org.apache.derby.impl.store.raw.RawStore.checkpoint(Unknown Source)
at org.apache.derby.impl.store.raw.log.LogToFile.performWork(Unknown Source)
at org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(Unknown Source)
at org.apache.derby.impl.services.daemon.BasicDaemon.work(Unknown Source)
at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown Source)
at java.lang.Thread.run(Thread.java:801)

Would it be worthwhile to have a runtime option (something you can put in the derby.properties) to disable NIO or does anyone have ideas on how to proceed with this issue?

Myrna van Lunteren added a comment - 02/Apr/08 10:14 PM
I'd think an option to disable the NIO as a workaround until something better can be found may be helpful to the users hitting this. (Although possibly not so helpful to whomever will pick this up to fix).

Knut Anders Hatlen added a comment - 08/Apr/08 11:54 AM
I think I see how this can happen.

When we read or write through the page cache, we use FileChannels to
allow multiple threads to access the same data file in parallel (after
DERBY-801). The first alloc page in a file can also be accessed
directly by the container object (via the container cache), in which
case RandomAccessFile.seek() + read/write is used instead of
FileChannel.

Theoretically, this should work because the FileChannel operations use
absolute positions and should therefore not influence the positioning
of the read/write performed by the container object, and
synchronization ensures that RAF.seek+read/write only happens in one
thread at a time. However, it seems like on some platforms[1] the
position of the RandomAccessFile object can be changed when operations
using absolute positioning are performed on the FileChannel object
returned by getChannel() on the RAF. I guess this is to be expected,
since FileChannel's javadoc only guarantees that the operations on a
FileChannel object are thread safe, not that a mix of operations on
FileChannel objects and RandomAccessFile objects is thread safe.

So what I think is happening, is that the check point code is cleaning
the container cache. At the same time, the page cache initiates the
cleaning of a page in the same container as the container cache is
cleaning, resulting in a FileChannel operation which changes the
position of the RandomAccessFile used by the container cache. The
subsequent call to RandomAccessFile.readFully() therefore reads data
from the wrong position in the file. When the container cache later
attempts to write the data back to the file, the inconsistency is
detected and the XSDB3 error is raised.

I believe that we can fix this issue by rewriting the parts that don't
yet use FileChannel, so that FileChannel is used consistently and the
thread-safety guarantees in FileChannel's javadoc apply.

[1] Some platforms == Windows. I've also tested it on Linux and
Solaris, and there the position on the RandomAccessFile doesn't seem
to be affected by operations using absolute positions.

Thomas Nielsen added a comment - 08/Apr/08 01:02 PM - edited
Knut,

Do you think that this, or a rather similar scenario, could also be the reason for some of the checksum inconsistencies that have been reported lately?
I'm envisioning a scenario where instead of a parallel cache clean, there's a parallel flush and/or checksum calculation performed by the checkpoint and file cache, and ultimately causing the wrong checksum to be flushed to disk?

Most inconsistencies seem to be related to windows, and different RandomAccessFile behaviour may possibly be the culprit?

Just an idea...

Dyre Tjeldvoll added a comment - 08/Apr/08 01:23 PM
Knut, I see that you have assigned yourself to this issue. Does that mean you have a fix in the pipe line? If it could be merged to 10.4 fairly soon, I think we should consider spinning another RC with this fix...

Knut Anders Hatlen added a comment - 08/Apr/08 01:53 PM
Thomas:
> Do you think that this, or a rather similar scenario, could also be the reason for some of the checksum inconsistencies that have been reported lately?

Yes, that could be the reason, at least if the checksum inconsistencies happen on page 0 (which is the first alloc page). There also are a lot of reports about EOFException thrown by RandomAccessFile.readFully() in FileContainer.getEmbryonicPage() (all seen on Windows only) which I think are likely caused by the same problem.

Dyre: Yes, I have a patch in the pipeline. I haven't tested it yet, but I'll do so and post it tomorrow. I'll leave it up to you to decide whether it is worth a new RC. If it fixes these problems, I think it is worth it, but it is probably difficult to verify the fix that quickly.

Knut Anders Hatlen added a comment - 08/Apr/08 03:51 PM
Here's a preview patch for those who are interested. It is not ready for commit (the regression tests haven't completed yet, the code needs more comments, and some old comments need to be updated), but it should give an impression of how I'm planning to solve it. Basically, it factors out the occurrences of RandomAccessFile.seek()+read/write in FileChannel into separate methods that can be overridden by the RAFContainer4 class and replaced with FileChannel calls. I also needed to change the way RAFContainer4 initialized the field ourChannel, since with these changes it is too late to initialize it at the end of createContainer() and openContainer().

Knut Anders Hatlen added a comment - 09/Apr/08 09:24 AM
Another related problem:

Before DERBY-801, there would be only one thread reading/writing the same data file at a time. After DERBY-801, there could be many threads accessing the same data file concurrently, but this should be OK since they would access different pages (read/write to the same page will be serialized by BasePage.setExclusive()). There is however no mechanism, as far as I can see, that prevents concurrent updates of the container information in page 0 and normal page updates that touch page 0. So I think it is possible that updates to the container information section of page 0 are lost because concurrent updates of AllocPage 0 overwrite the new information with the old value. We probably need to find a way to prevent AllocPage 0 from being cleaned while the container is being cleaned (that is, RAFContainer4.writePage(pageId=0) and RAFContainer.writeRAFHeader() must be serialized somehow).

Knut Anders Hatlen added a comment - 09/Apr/08 09:30 AM
The preview patch caused two failures in derbyall on Solaris 10 with JDK 6. unit/recoveryTest.unit and store/dropcrash2.java failed with an assert failure in AllocPage.ReadContainerInfo(): ASSERT FAILED N not what is expected : 0

Knut Anders Hatlen added a comment - 09/Apr/08 01:38 PM
Attaching an updated preview patch, still not ready for commit. The updated patch fixes the assert failures seen in the recovery tests. The problem was that the new methods in RAFContainer4 used the FileChannel associated with RAFContainer.fileData instead of the FileChannel associated with the file argument. The fileData field and the file argument are normally the same object, but not when a container is stubbified, so redo of DROP TABLE operations would fail. Now the recovery tests pass.

Knut Anders Hatlen added a comment - 10/Apr/08 11:39 AM
I've added more comments and attached a new patch (d3347-1a) which is ready for review. It only addresses the problem with the mix of old-style I/O and NIO on some platforms. The problem with multiple threads accessing page 0 concurrently will have to be fixed in a follow-up patch.

Knut Anders Hatlen added a comment - 10/Apr/08 12:53 PM
I'm also attaching a patch (d3347-2a) for the problem with concurrent I/O operations on page 0.

This patch adds a wrapper around the methods readPage() and writePage() in RAFContainer4. If the page that is accessed is FileContainer.FIRST_ALLOC_PAGE_NUMBER, the method calls are synchronized on the container object, and therefore behave just like their non-NIO counterparts in RAFContainer. Since the code that accesses the container information on the first alloc page also is synchronized on the container object, this means that we never have two threads accessing the first alloc page concurrently. For the rest of the pages, the synchronization is not necessary, since they can only be accessed through the page cache and therefore no two threads access the same page concurrently.

I haven't run the regression tests on this patch yet.

Dag H. Wanvik added a comment - 11/Apr/08 12:59 AM
I don't know this code area very well, so caveats apply ;)

- Patch 1: Analysis that some read/writes use FileChannel to access
  first page while others use the RandomAccessFile methods directly
  (and hence no guarantee for serializability) seems correct to
  me. Refactoring to remedy that looks good.

  The cleanup on use/initialization of FileChannel is good, too.

  RAFContainer4#write has lost a check for getCommittedDropState(). Is
  this intentional?

  Naming: FileContainer#readHeader doesn't do any reading any longer,
  since it no longer calls getEmbryonicPage. This is slightly
  confusing.

  BTW: (not introduced by this patch, but I noticed so I thought I'd
  mention it) RAFContainer4 (in contrast to RAFContainer) allocates an
  encryption buffer for every page write. Guess it needs to since the
  write method is not synchronized (and can thus not use the dedicated
  buffer safely). Since such objects can be large, maybe a pool would be good?


- Patch 2:
  Although after patch 1, writeRAFHeader always uses FileChannel I/O
  now, I agree there is a chance for the race you suggest. The patch
  is simple and looks good to me.
  +1

Good detective work, Knut!


Knut Anders Hatlen added a comment - 11/Apr/08 08:50 AM
Thanks for reviewing the patch, Dag!

> RAFContainer4#write has lost a check for getCommittedDropState(). Is
> this intentional?

Good catch! No, it wasn't intentional. The new patch (1b) reintroduces the check, and also reintroduces the assert for !getCommittedDropState() in readPage().

> Naming: FileContainer#readHeader doesn't do any reading any longer,
> since it no longer calls getEmbryonicPage. This is slightly
> confusing.

Agreed, I think. Although the two methods it still invokes are called ReadContainerInfo() and readHeaderFromArray(), so I think there is some justification for keeping "read" in its name. If we come up with a better name, we can always rename it later.

> BTW: (not introduced by this patch, but I noticed so I thought I'd
> mention it) RAFContainer4 (in contrast to RAFContainer) allocates an
> encryption buffer for every page write. Guess it needs to since the
> write method is not synchronized (and can thus not use the dedicated
> buffer safely). Since such objects can be large, maybe a pool would be good?

(I guess you meant "every encrypted page write", not "every page write"...)

Yes, that was also commented in the review of DERBY-801, and DERBY-2086 was logged for it.

Knut Anders Hatlen added a comment - 11/Apr/08 08:54 AM
The 1b patch replaces the 1a patch. It only contains one change: The checks/asserts for getCommittedDropState() are reintroduced in readPage()/writePage().

Knut Anders Hatlen added a comment - 11/Apr/08 09:47 AM
I committed the 2a patch to trunk with revision 647091.

I have started the regression tests on the 1b patch as well. The changes from 1a are minimal, so I plan to commit it if the tests pass.

Dag H. Wanvik added a comment - 11/Apr/08 11:21 AM
> Agreed, I think. Although the two methods it still invokes are called
> ReadContainerInfo() and readHeaderFromArray(), so I think there is
> some justification for keeping "read" in its name. If we come up with
> a better name, we can always rename it later.

Yes, I notices there are several existing methods in there that are
called read<Something> and write<Something> that do no I/O, but just
analyze or format bytes arrays. It would be good to cleanup this code
some, but I am OK with deferring for now.

This comment in the StorageRandomAccessFile abstraction is also less
precise/correct after the advent of RAFContainer4 as a user of
StorageRandomAccessFile:

 "An implementation of StorageRandomAccessFile need not be thread
 safe. The database engine single-threads access to each
 StorageRandomAccessFile instance. Two threads will not access the
 same StorageRandomAccessFile instance at the same time."

After you reintroduced the getCommittedDropState checks,
+1 to the first patch as well!

Knut Anders Hatlen added a comment - 11/Apr/08 12:52 PM
Thanks again, Dag. Patch 1b has been committed with revision 647139.

I agree that the code and comments in this area would benefit from an overhaul. The abstractions in the original code were obviously chosen to match the RandomAccessFile way of accessing the data files, and the NIO code doesn't fit very well into that model. Some of the comments added as part of DERBY-801 also mention that a restructuring of the interfaces would be beneficial.

Regarding the comment in StorageRandomAccessFile, I think it's correct, actually. The only parts of a StorageRandomAccessFile that need to be thread safe, are the methods on the FileChannel object returned by the getChannel() method. However, the SRAF interface doesn't know anything about FileChannel or getChannel(), it's just RAFContainer4 that knows that some SRAF implementations are subclasses of java.io.RAF, and therefore manages to retrieve FileChannels from them. None of the SRAF methods are ever called concurrently on the same object.

Knut Anders Hatlen added a comment - 11/Apr/08 01:04 PM
I guess we should try to back-port the fix at least to 10.4. It would be great if we could get the fix verified first, but lacking a reliable repro that may be difficult.

Dyre Tjeldvoll added a comment - 11/Apr/08 02:45 PM
+1 to merging to 10.4 (I added 10.4.1.2 as a fix-version, and have canceled the ongoing vote on 10.4.1.1 on derby-dev).

Knut Anders Hatlen added a comment - 11/Apr/08 09:18 PM
I have merged the fixes to the 10.4 branch (revisions 647310 and 647312).
I'm leaving the issue open for now, since the fix hasn't been verified.

Bogdan Calmac added a comment - 11/Apr/08 09:56 PM
I can rerun my endurance test next week using 10.4.

Knut Anders Hatlen added a comment - 11/Apr/08 11:04 PM
Thanks Bogdan, that would be great!

Kathey Marsden added a comment - 12/Apr/08 04:29 AM
In working on another issue I hit this bug and thought I would post my program in case it can be expanded to become a more reliable reproduction. You run it with the number of threads as an argument and might want to bump up memory e.g.

java -Xmx512M MiniStress 50

I ran the program once to the end without error and then reran on the same db and got the error. I haven't yet updated up to get the fix.
I ran it on my dual core Windows XP box with Sun jdk 1.6. Maybe on a box with more CPU's it will pop the bug more reliably.

Here is the trace from the derby.log

------------ BEGIN SHUTDOWN ERROR STACK -------------


ERROR XSDG1: Page Page(1039,Container(0, 5856)) could not be written to disk, please check if disk is full.

at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:296)

at org.apache.derby.impl.store.raw.data.CachedPage.writePage(CachedPage.java:824)

at org.apache.derby.impl.store.raw.data.CachedPage.clean(CachedPage.java:605)

at org.apache.derby.impl.services.cache.ConcurrentCache.cleanAndUnkeepEntry(ConcurrentCache.java:551)

at org.apache.derby.impl.services.cache.ClockPolicy.rotateClock(ClockPolicy.java:476)

at org.apache.derby.impl.services.cache.ClockPolicy.insertEntry(ClockPolicy.java:176)

at org.apache.derby.impl.services.cache.ConcurrentCache.insertIntoFreeSlot(ConcurrentCache.java:208)

at org.apache.derby.impl.services.cache.ConcurrentCache.find(ConcurrentCache.java:284)

at org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(FileContainer.java:2412)

at org.apache.derby.impl.store.raw.data.FileContainer.getPage(FileContainer.java:2462)

at org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(BaseContainerHandle.java:319)

at org.apache.derby.impl.store.access.btree.ControlRow.get(ControlRow.java:833)

at org.apache.derby.impl.store.access.btree.ControlRow.get(ControlRow.java:820)

at org.apache.derby.impl.store.access.btree.BTreePostCommit.doShrink(BTreePostCommit.java:123)

at org.apache.derby.impl.store.access.btree.BTreePostCommit.performWork(BTreePostCommit.java:249)

at org.apache.derby.impl.store.raw.xact.Xact.postTermination(Xact.java:2045)

at org.apache.derby.impl.store.raw.xact.Xact.completeCommit(Xact.java:818)

at org.apache.derby.impl.store.raw.xact.Xact.commit(Xact.java:854)

at org.apache.derby.impl.store.raw.xact.Xact.commit(Xact.java:649)

at org.apache.derby.impl.store.access.RAMTransaction.commit(RAMTransaction.java:1964)

at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.doCommit(GenericLanguageConnectionContext.java:1211)

at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.userCommit(GenericLanguageConnectionContext.java:1031)

at org.apache.derby.impl.jdbc.TransactionResourceImpl.commit(TransactionResourceImpl.java:237)

at org.apache.derby.impl.jdbc.EmbedConnection.commit(EmbedConnection.java:1661)

at WorkerThread.run(WorkerThread.java:24)

Caused by: java.nio.channels.ClosedChannelException

at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:91)

at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:642)

at org.apache.derby.impl.store.raw.data.RAFContainer4.writeFull(RAFContainer4.java:397)

at org.apache.derby.impl.store.raw.data.RAFContainer4.writePage(RAFContainer4.java:291)

at org.apache.derby.impl.store.raw.data.CachedPage.writePage(CachedPage.java:782)

... 23 more

============= begin nested exception, level (1) ===========

java.nio.channels.ClosedChannelException

at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:91)

at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:642)

at org.apache.derby.impl.store.raw.data.RAFContainer4.writeFull(RAFContainer4.java:397)

at org.apache.derby.impl.store.raw.data.RAFContainer4.writePage(RAFContainer4.java:291)

at org.apache.derby.impl.store.raw.data.CachedPage.writePage(CachedPage.java:782)

at org.apache.derby.impl.store.raw.data.CachedPage.clean(CachedPage.java:605)

at org.apache.derby.impl.services.cache.ConcurrentCache.cleanAndUnkeepEntry(ConcurrentCache.java:551)

at org.apache.derby.impl.services.cache.ClockPolicy.rotateClock(ClockPolicy.java:476)

at org.apache.derby.impl.services.cache.ClockPolicy.insertEntry(ClockPolicy.java:176)

at org.apache.derby.impl.services.cache.ConcurrentCache.insertIntoFreeSlot(ConcurrentCache.java:208)

at org.apache.derby.impl.services.cache.ConcurrentCache.find(ConcurrentCache.java:284)

at org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(FileContainer.java:2412)

at org.apache.derby.impl.store.raw.data.FileContainer.getPage(FileContainer.java:2462)

at org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(BaseContainerHandle.java:319)

at org.apache.derby.impl.store.access.btree.ControlRow.get(ControlRow.java:833)

at org.apache.derby.impl.store.access.btree.ControlRow.get(ControlRow.java:820)

at org.apache.derby.impl.store.access.btree.BTreePostCommit.doShrink(BTreePostCommit.java:123)

at org.apache.derby.impl.store.access.btree.BTreePostCommit.performWork(BTreePostCommit.java:249)

at org.apache.derby.impl.store.raw.xact.Xact.postTermination(Xact.java:2045)

at org.apache.derby.impl.store.raw.xact.Xact.completeCommit(Xact.java:818)

at org.apache.derby.impl.store.raw.xact.Xact.commit(Xact.java:854)

at org.apache.derby.impl.store.raw.xact.Xact.commit(Xact.java:649)

at org.apache.derby.impl.store.access.RAMTransaction.commit(RAMTransaction.java:1964)

at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.doCommit(GenericLanguageConnectionContext.java:1211)

at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.userCommit(GenericLanguageConnectionContext.java:1031)

at org.apache.derby.impl.jdbc.TransactionResourceImpl.commit(TransactionResourceImpl.java:237)

at org.apache.derby.impl.jdbc.EmbedConnection.commit(EmbedConnection.java:1661)

at WorkerThread.run(WorkerThread.java:24)

============= end nested exception, level (1) ===========




------------ END SHUTDOWN ERROR STACK -------------





Mike Matrigali added a comment - 12/Apr/08 05:59 AM
The most likely way this bug will show up is concurrent writes to page 0. But the bug may not be recognized for a long time as it only corrupts the on disk page, while the page is cache is likely
fine (i am not sure of this but seems possible). This bug in particular affects page 0, which happens to be accessed a lot and thus is in cache a lot. So best way to catch it, is as Kathey observed -
run your concurrent test and then shutdown completely and then start up again and check. Easiest
way to check is just to run the consistency checker against any tables you might be testing against.

Kathey Marsden added a comment - 12/Apr/08 07:01 PM
Thanks Mike for the info. I updated the program to shutdown the db when done and check the tables. I can reproduce corruption in page 0 pretty reliably with 100 threads. I am going to update my client now and see if the problem goes away with the fix.


Kathey Marsden added a comment - 13/Apr/08 01:39 PM
I ran the program more than a dozen times without error with the fix after getting corruption 3/3 times without the fix. So the fix looks good from this case's perspective. Attached is the updated program that does the shutdown and table check MiniStress.zip. Maybe someone that runs stress tests could incorporate something similar.

Knut Anders Hatlen added a comment - 14/Apr/08 10:17 AM
Thanks for taking the time to test the fix, Kathey!
I haven't been able to reproduce the problem myself, but I'm marking the issue as resolved based on your testing. If Bogdan's testing shows that it's not fixed, we can reopen it.

Kathey Marsden added a comment - 14/Apr/08 04:29 PM
Knut, do you plan to backport this fix to 10.3?

Knut Anders Hatlen added a comment - 15/Apr/08 10:57 AM
Merged the fixes to the 10.3 branch and committed revision 648210.

Mike Matrigali added a comment - 15/Apr/08 11:16 PM
Trying to figure out what to tell 10.3 users that may have hit this. Does anyone have a db that they can boot that has a table that gets this error on access. Can you drop the index or table if it gets this error? If you get this error during recovery you are going to have to go to backup, but i am hoping that if the error is on an index we can figure out some way to just be able to drop and recreate the index.

How likely is this error to have caused more than page 0 to be corrupted? Does the unintended i/o interaction have a chance to push a write that was supposed to start at offset 0 to somewhere later thus causing it to overwrite page 1?

Knut Anders Hatlen added a comment - 16/Apr/08 07:23 AM
I'm not sure (since I haven't been able to reproduce the problem myself) but I don't think you'll be able to drop the table/index. I think DROP TABLE will open the container it's about to drop, and then it'll also read the container information stored in page 0.

I also think it is possible that the problem with mixing FileChannel operations and RandomAccessFile.seek() can cause corruption on any page in the container, although based on the reports, it seems like page 0 is the most likely one to be hit.

I'm attaching the program I used to track down the problem (no Derby code, only pure file operations), FileOffset.java. It runs 10 threads that write at random positions in a file using a FileChannel, while the main thread runs a loop with RAF.seek(0L) + sleep() + RAF.getFilePointer(). On Solaris and Linux, getFilePointer() always returns 0, but on Windows it frequently returns other values pointing at random positions in the file (only tested with Sun's JVMs). So it is quite possible that a write occurs at the wrong position and corrupts another page than page 0.

The other problem (concurrent access to page 0 by container cache and page cache) can only hit page 0, I believe.

Knut Anders Hatlen added a comment - 29/Apr/08 01:26 PM
Attaching a release note.

Bryan Pendleton added a comment - 29/Apr/08 01:38 PM
Is it true that this problem only affected Windows platforms? If so, would this be worth including in the note?

Knut Anders Hatlen added a comment - 29/Apr/08 02:14 PM
It is true that parts of the problem didn't seem to happen on Solaris and Linux, whereas it did on Windows (see the attached FileOffset.java). But I only tried it on a limited set of JVM's, so I can't say definitely that it's not a problem on any JVM on any other OS than Windows.

Also, the possibility of overwriting parts of page 0 with old data due to lack of synchronization between page writes and container meta-data writes, is not limited to a single platform.

Bogdan Calmac added a comment - 01/May/08 04:31 PM
I've run our endurance test for about a week using 10.4 svn 648059 and it was solid as a rock.

Knut Anders Hatlen added a comment - 01/May/08 05:23 PM
Thanks for testing it, Bogdan! I'm glad it solved your problems.