|
[
Permlink
| « Hide
]
Knut Anders Hatlen added a comment - 22/Jan/09 02:46 PM
Do you also have the stack trace for the exception?
Thank you for your reply.
I guess you mean an exception stack trace in derby.log? The log was deleted when I restored the database, so I will send it the next time the problem occurs. If you need something else than output from the derby.log, please specify. Yes, the exception stack trace in derby.log would be fine.
So, it has happened a couple of more times now, but when we look in the derby.log file, there is no exception. There are only the startup messages:
---------------------------------------------------------------- 2009-01-25 11:23:35.851 GMT: Booting Derby version The Apache Software Foundation - Apache Derby - 10.4.2.0 - (689064): instance a816c00e-011f-0d87-c36b-0000003a8778 on database directory /home/uwe/workspace/aifudis/aifudis.operator/javadb-activemq Database Class Loader started - derby.database.classpath='' ---------------------------------------------------------------- 2009-01-25 11:23:38.219 GMT: Booting Derby version The Apache Software Foundation - Apache Derby - 10.4.2.0 - (689064): instance 80840021-011f-0d87-c36b-0000003a8778 on database directory /home/uwe/workspace/aifudis/aifudis.operator/javadb-development Database Class Loader started - derby.database.classpath='' Please advise my next step. It looks like this error has statement severity, which is not logged by default, so the derby.stream.error.logSeverityLevel property needs to be set to 20000 or lower in order to get it logged in derby.log. Set it to 0 to get all exceptions logged. http://db.apache.org/derby/docs/10.4/tuning/rtunproper26985.html
Alternatively, you could modify your application to print the stack trace instead of just the error message. Any information about the transaction mix and the database schema would also be helpful. I have attached a derby.log for after the problem occurs. Please look at it and see if you can see how to solve the problem.
I'll start reproducing the problem and attach another derby.log with the first exception we get, which is different. Can you try to run with the Derby debug version?
Looking at your derby.log, the exception can be thrown in two places. By running with the debug build, we'll either get the line number of the exception, or possibly an assert error somewhere else. The debug version can be downloaded from the Apache Derby site. There two cases are: 1) There is not enough data in the page to read the header (2 bytes). 2) There is not enough data in the page to read the user data according to the encoded byte length in the header. Possible causes: o The page is somehow corrupt. o The header is wrong. o A bug where wrong information about the field data is used. (o I also note that JRuby is being used.) To investigate further, we need more information. The easiest thing to do is to run with the debug version to obtain line numbers (or an assert). I don't know too much about JRuby, but I assume you are not using JDBC directly? Also, is the data you are inserting ASCII only, or are you inserting for instance CJK or nordic characters? Another thing to do for those looking into this, could be to understand what exactly is going on in the BTree (i.e. comparePreviousRecord). Finally, in the description you wrote "It started yesterday". Does this mean that the application has been running successfully for a longer period of time? And on the different systems, are you running against the same database or regenerating it from scratch? comparePreviousRecord() is used by B-trees that back a unique constraint on columns that could be null (new feature in Derby 10.4). Some possible problems with that method were identified in
The original bug report mentioned another error message. If that error occurs again, it would be great if you could post that stack trace too. Here is a derby.log with debug information. I hope you can make sense out of it.
An additional information: We recently (less than 2 weeks ago), started persisting ActiveMQ messages using derby. ActiveMQ is running embedded in the same JVM, so we access 2 derby databases in the same JVM. > I don't know too much about JRuby, but I assume you are not using JDBC directly? We access JDBC using ActiveRecord, but I expect this should not make a difference. > Also, is the data you are inserting ASCII only, or are you inserting for instance CJK or nordic characters? Nordic characters are definitely possible. > Finally, in the description you wrote "It started yesterday". Does this mean that the application has been running successfully for a longer period of time? Yes, the application has been running for long durations earlier, but we constantly develop the application, so circumstances change, and we regularly recreate the development database. Two production databases are still running without problems, so something we changed in the last few days is triggering the error, but we are unable to identify any special changes. We are possibly exercising the database slightly more than earlier by deleting/inserting some rows that were updated earlier. > And on the different systems, are you running against the same database or regenerating it from scratch? All systems own their own embedded derby database. There is no database network access. Thanks.
Is it possible for you to post the schema of the database (javadb-development)? If that isn't possible, can you confirm if you are using a unique nullable constraint/index on one of the character columns? I can indeed confirm that we are using a unique nullable constraint (not index since it does not allow multiple rows with NULL) in a character column.
Are we close? Here is a slightly different derby.log, with a slightly different error message. I hope it helps.
My mistake: There are indeed nullable unique indexes on the orders table.
Do you still want the database schema? Can you please tell me the easiest way to dump it? We have ij available. Have you found a probable cause for the error? Well, there's no repro available to test against...
One step forward would be to determine if the error is made when writing the page/record or when it is being read back. As a work-around, maybe you could remove the constraint (or use just unique and insert unique values)? If that works, it strengthens the suspicion against the unique nullable code. Looks like the latest error also happened in the index backing the nullable unique constraint (compareLeftAndRightSiblings() is only used by that kind of index):
ERROR XSDA1: An attempt was made to access an out of range slot on a page at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:276) at org.apache.derby.impl.store.raw.data.BasePage.checkSlotOnPage(BasePage.java:1838) at org.apache.derby.impl.store.raw.data.BasePage.fetchFromSlot(BasePage.java:431) at org.apache.derby.impl.store.access.btree.BTreeController.compareNextRecord(BTreeController.java:454) at org.apache.derby.impl.store.access.btree.BTreeController.compareLeftAndRightSiblings(BTreeController.java:555) at org.apache.derby.impl.store.access.btree.BTreeController.doIns(BTreeController.java:812) The easiest way to dump the database schema is to use dblook.
$ java -jar derbyrun.jar dblook -d jdbc:derby:mydb -o dblook.txt Uwe Kubosch wrote:
----- Do you still want the database schema? ----- Yes, please. It would also help to know more about the data you are inserting, and the access pattern. The more information you provide, the better :) I attached a schema dump with the unique constraints.
I used the command mentioned in this issue. Is there a way to make the dump sorted the same way each time to enable easy diff of changes? Our usage pattern when triggering the error is typically to delete all rows in the orders table, and then insert about 1000 rows similar to the one in the description of this issue. I was able to reproduce the error with the attached program Derby4027Repro.java. To reproduce, create a database wombat and run the schema.sql script to set up the schema, then run the program. It will fail with
[C:/kmarsden/repro/derby-4027] java Derby4027Repro Exception in thread "main" java.sql.SQLException: An attempt was made to access an out of range slot on a page at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:201) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:391) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2201) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1323) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1648) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(EmbedPreparedStatement.java:294) at Derby4027Repro.main(Derby4027Repro.java:15) Caused by: java.sql.SQLException: An attempt was made to access an out of range slot on a page at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:11 9) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70) ... 9 more Caused by: ERROR XSDA1: An attempt was made to access an out of range slot on a page at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:276) at org.apache.derby.impl.store.raw.data.BasePage.checkSlotOnPage(BasePage.java:1830) at org.apache.derby.impl.store.raw.data.BasePage.fetchFromSlot(BasePage.java:431) at org.apache.derby.impl.store.access.btree.BTreeController.compareNextRecord(BTreeController.java:455) at org.apache.derby.impl.store.access.btree.BTreeController.compareLeftAndRightSiblings(BTreeController.java:556 ) at org.apache.derby.impl.store.access.btree.BTreeController.doIns(BTreeController.java:813) at org.apache.derby.impl.store.access.btree.BTreeController.insert(BTreeController.java:1261) at org.apache.derby.impl.store.access.btree.index.B2IController.insert(B2IController.java:210) at org.apache.derby.impl.sql.execute.IndexChanger.insertAndCheckDups(IndexChanger.java:439) at org.apache.derby.impl.sql.execute.IndexChanger.doInsert(IndexChanger.java:383) at org.apache.derby.impl.sql.execute.IndexChanger.insert(IndexChanger.java:589) at org.apache.derby.impl.sql.execute.IndexSetChanger.insert(IndexSetChanger.java:267) at org.apache.derby.impl.sql.execute.RowChangerImpl.insertRow(RowChangerImpl.java:453) at org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(InsertResultSet.java:1022) at org.apache.derby.impl.sql.execute.InsertResultSet.open(InsertResultSet.java:495) at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:416) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:297) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1235) ... 3 more confirmed this is a regression and does not occur with 10.3.3.1
So now we have a way to reproduce it. Great! Thanks, Uwe and Kathey.
I'm unchecking the regression checkbox since the repro uses a feature that is not present in 10.3 -- unique constraint on possibly null columns. The stack trace shows that the exception is thrown when updating an index for such a constraint. The reason why it works on 10.3 is that adding the unique constraint failed when running schema.sql: ij> ALTER TABLE "AIFUDIS"."ORDERS" ADD CONSTRAINT "ORDERS_FLIGHT_NO_STD_INDEX" UNIQUE ("FLIGHT_NO", "STD"); ERROR 42831: 'FLIGHT_NO' cannot be a column of a primary key or unique key because it can contain null values. ij> ALTER TABLE "AIFUDIS"."ORDERS" ADD CONSTRAINT "ORDERS_ICAO_FLIGHT_NO_STD_INDEX" UNIQUE ("ICAO_FLIGHT_NO", "STD"); ERROR 42831: 'ICAO_FLIGHT_NO' cannot be a column of a primary key or unique key because it can contain null values. BTreeController.compareNextRecord() and comparePreviousRecord() are on a page boundary and therefore need to check the first record of the next page or the last record of the previous page, they don't check if the slot number is out of bounds when they go to the next/previous page. This causes problems if the neighbour page doesn't have any records. The attached patch (recheck-slot.diff) makes the methods recheck that the slot numbers are within bounds after moving to another page.
What I don't quite understand about the code, is that the record with slot number 0 is not used. That is, the code only looks at slots 1 ... (recordCount() - 1). Does anyone know why? Slot 0 is a valid slot, as far as I read the code, but if I change those methods to look at slot 0, they start throwing EOFException. I haven't run the regression tests, but at least the repro ran without errors with the patch. FWIW, all regression tests ran cleanly.
So, this patch will be included in 10.4.2.1 ?
In btree pages, the first row, ie. the row in slot 0 is a special row and not a normal key row. It is a control row that has whatever meta data is necessary for linking up the pages. There are 2 different control rows depending on whether it is a leaf or branch. See BranchControlRow.java and LeafControlRow.java for details on what is in each.
Thanks for the explanation, Mike.
Here's a new version of the patch. Same code changes as before, but now with comments. Also added a simplified version of Kathey's test case to NullableUniqueConstraintTest. Uwe, I'm planning to commit the fix to trunk so that it will be part of the upcoming 10.5.1 release (http://wiki.apache.org/db-derby/DerbyTenFiveOneRelease), and also to backport it to the 10.4 branch so that it will be part of the next 10.4 release if there will be one.
Excellent! Thank you!
Out of curiosity: What about version 10.5.0.0 ? Where can I read about what the parts of the derby version numbers mean? 10.5.0.0 is the version number used for everything that is checked in on the development branch (trunk) until we create a separate 10.5 stable branch. There won't be a 10.5.0.0 release, instead the version number is bumped to 10.5.1.x when we create the stable branch. Some more explanation here: http://db.apache.org/derby/papers/versionupgrade.html
Thank you for the explanation and the link.
Committed to trunk with revision 744984.
I merged the fix to the 10.4 branch and committed revision 745485.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||