Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Cannot Reproduce
    • Affects Version/s: 10.8.1.2
    • Fix Version/s: 10.8.1.2
    • Component/s: Store
    • Labels:
      None
    • Environment:
    • Urgency:
      Urgent
    • Bug behavior facts:
      Crash, Data corruption

      Description

      Steps to hit the issue:
      1. Create a derby db with a few tables
      2. Perform some db operations (there's going to be a mix of selects, updates, and inserts here across several tables).
      3. Kill the java process during the db operations
      4. Attempt to start the java process again.
      5. Repeat from (2) a few times.

      After a few iterations, starting derby fails with this exception:

      Exception in thread "main" java.sql.SQLException: Failed to start database '/Users/tim/jiras/mnk-2519/ObjectDataSynchronousWriteTest/objectdb/datadb' with class loader sun.misc.Launcher$AppClassLoader@61e63e3d, see the next exception for details.
      at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
      at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source)
      at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
      at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
      at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
      at org.apache.derby.impl.jdbc.EmbedConnection30.<init>(Unknown Source)
      at org.apache.derby.impl.jdbc.EmbedConnection40.<init>(Unknown Source)
      at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Unknown Source)
      at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
      at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
      at java.sql.DriverManager.getConnection(DriverManager.java:582)
      at java.sql.DriverManager.getConnection(DriverManager.java:154)
      at OpenDerby.main(OpenDerby.java:16)
      Caused by: java.sql.SQLException: Failed to start database '/Users/tim/jiras/mnk-2519/ObjectDataSynchronousWriteTest/objectdb/datadb' with class loader sun.misc.Launcher$AppClassLoader@61e63e3d, see the next exception for details.
      at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
      at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
      ... 13 more
      Caused by: java.sql.SQLException: An exception was thrown during transaction abort.
      at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
      at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
      at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
      at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
      ... 10 more
      Caused by: ERROR XSTB0: An exception was thrown during transaction abort.
      at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
      at org.apache.derby.impl.store.raw.xact.Xact.abort(Unknown Source)
      at org.apache.derby.impl.store.raw.xact.XactFactory.rollbackAllTransactions(Unknown Source)
      at org.apache.derby.impl.store.raw.log.LogToFile.recover(Unknown Source)
      at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
      at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
      at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
      at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
      at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
      at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source)
      at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
      at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
      at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
      at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
      at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
      at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
      at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
      at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
      at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
      at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source)
      at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source)
      at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source)
      at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source)
      ... 10 more
      Caused by: ERROR XSLA8: Cannot rollback transaction 2989, trying to compensate null operation with null
      at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
      at org.apache.derby.impl.store.raw.log.FileLogger.undo(Unknown Source)
      ... 32 more
      Caused by: ERROR XSLA1: Log Record has been sent to the stream, but it cannot be applied to the store (Object null). This may cause recovery problems also.
      at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
      at org.apache.derby.impl.store.raw.log.FileLogger.logAndUndo(Unknown Source)
      at org.apache.derby.impl.store.raw.xact.Xact.logAndUndo(Unknown Source)
      ... 33 more
      Caused by: ERROR XSDB0: Unexpected exception on in-memory page Page(1,Container(0, 1409))
      at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
      at org.apache.derby.impl.store.raw.data.StoredPage.storeRecordForInsert(Unknown Source)
      at org.apache.derby.impl.store.raw.data.StoredPage.storeRecord(Unknown Source)
      at org.apache.derby.impl.store.raw.data.PurgeOperation.undoMe(Unknown Source)
      at org.apache.derby.impl.store.raw.data.PhysicalUndoOperation.doMe(Unknown Source)
      ... 35 more

      The circumstances are essentially the same as what was in DERBY-5248, but the error seems to be a bit different. Opening a fresh issue since the nature of the problem is not quite clear. This issue can be merged if the root cause is found to be the same.

      I'm attaching the corrupted db along with the output from opening the db with a sane build. Please let me know if any more information is needed.

      1. crashed_db_11_18_2011.tar.gz
        1011 kB
        Tim Wu
      2. runtimelogrecorddump.jar
        1.04 MB
        Mike Matrigali
      3. fulllogrecorddump.jar
        494 kB
        Mike Matrigali
      4. objectdb.zip
        938 kB
        Tim Wu
      5. sane_output.zip
        148 kB
        Tim Wu

        Issue Links

          Activity

          Hide
          Rick Hillegas added a comment -

          Closing this issue due to lack of activity. We can re-open this if more information becomes available.

          Show
          Rick Hillegas added a comment - Closing this issue due to lack of activity. We can re-open this if more information becomes available.
          Hide
          Rick Hillegas added a comment -

          Hi Mike,

          Looks like Tim attached a broken db in November. Was that useful in understanding what is causing his crash? Is there more information which Tim can provide? Thanks.

          Show
          Rick Hillegas added a comment - Hi Mike, Looks like Tim attached a broken db in November. Was that useful in understanding what is causing his crash? Is there more information which Tim can provide? Thanks.
          Hide
          Tim Wu added a comment -

          After a few more months of testing, we seem to have hit the same crash again. I've attached the crashed db. In terms of getting into the situation, all I really know is that we just killed the process during some db operations.

          java.sql.SQLException: Failed to start database '/export1/cruise/jenkins/workspace/terracotta_trunk_tests_JDK_1.5_derby_crash/trunk/community/dso-crash-tests/target/temp/ObjectDataSynchronousWriteTest/objectdb/datadb' with class loader sun.misc.Launcher$AppClassLoader@198dfaf, see the next exception for details.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
          at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source)
          at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
          at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
          at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
          at org.apache.derby.impl.jdbc.EmbedConnection30.<init>(Unknown Source)
          at org.apache.derby.jdbc.Driver30.getNewEmbedConnection(Unknown Source)
          at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
          at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
          at java.sql.DriverManager.getConnection(DriverManager.java:525)
          at java.sql.DriverManager.getConnection(DriverManager.java:140)
          at com.tc.objectserver.storage.derby.DerbyDBEnvironment.openDatabase(DerbyDBEnvironment.java:243)
          at com.tc.objectserver.storage.derby.DerbyDBEnvironment.open(DerbyDBEnvironment.java:192)
          at com.tc.objectserver.persistence.db.DBPersistorImpl.open(DBPersistorImpl.java:122)
          at com.tc.objectserver.persistence.db.DBPersistorImpl.<init>(DBPersistorImpl.java:60)
          at com.tc.objectserver.impl.DistributedObjectServer.start(DistributedObjectServer.java:612)
          at com.tc.server.TCServerImpl.startDSOServer(TCServerImpl.java:487)
          at com.tc.server.TCServerImpl.access$600(TCServerImpl.java:84)
          at com.tc.server.TCServerImpl$StartAction.execute(TCServerImpl.java:435)
          at com.tc.lang.StartupHelper.startUp(StartupHelper.java:39)
          at com.tc.server.TCServerImpl.startServer(TCServerImpl.java:466)
          at com.tc.server.TCServerImpl.start(TCServerImpl.java:243)
          at com.tc.server.TCServerMain.main(TCServerMain.java:30)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:592)
          at com.tc.lcp.LinkedJavaProcessStarter.main(LinkedJavaProcessStarter.java:34)
          Caused by: java.sql.SQLException: An exception was thrown during transaction abort.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
          at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
          ... 25 more
          Caused by: ERROR XSTB0: An exception was thrown during transaction abort.
          at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
          at org.apache.derby.impl.store.raw.xact.Xact.abort(Unknown Source)
          at org.apache.derby.impl.store.raw.xact.XactFactory.rollbackAllTransactions(Unknown Source)
          at org.apache.derby.impl.store.raw.log.LogToFile.recover(Unknown Source)
          at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
          at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
          at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
          at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
          at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
          at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source)
          at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source)
          ... 25 more
          Caused by: ERROR XSLA8: Cannot rollback transaction 5021, trying to compensate null operation with null
          at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
          at org.apache.derby.impl.store.raw.log.FileLogger.undo(Unknown Source)
          ... 47 more
          Caused by: ERROR XSLA1: Log Record has been sent to the stream, but it cannot be applied to the store (Object null). This may cause recovery problems also.
          at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
          at org.apache.derby.impl.store.raw.log.FileLogger.logAndUndo(Unknown Source)
          at org.apache.derby.impl.store.raw.xact.Xact.logAndUndo(Unknown Source)
          ... 48 more
          Caused by: ERROR XSDB0: Unexpected exception on in-memory page Page(1,Container(0, 1409))
          at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
          at org.apache.derby.impl.store.raw.data.StoredPage.storeRecordForInsert(Unknown Source)
          at org.apache.derby.impl.store.raw.data.StoredPage.storeRecord(Unknown Source)
          at org.apache.derby.impl.store.raw.data.PurgeOperation.undoMe(Unknown Source)
          at org.apache.derby.impl.store.raw.data.PhysicalUndoOperation.doMe(Unknown Source)
          ... 50 more

          This failure was with Derby 10.8.2.2, so it should include the previous fixes.

          Show
          Tim Wu added a comment - After a few more months of testing, we seem to have hit the same crash again. I've attached the crashed db. In terms of getting into the situation, all I really know is that we just killed the process during some db operations. java.sql.SQLException: Failed to start database '/export1/cruise/jenkins/workspace/terracotta_trunk_tests_JDK_1.5_derby_crash/trunk/community/dso-crash-tests/target/temp/ObjectDataSynchronousWriteTest/objectdb/datadb' with class loader sun.misc.Launcher$AppClassLoader@198dfaf, see the next exception for details. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection30.<init>(Unknown Source) at org.apache.derby.jdbc.Driver30.getNewEmbedConnection(Unknown Source) at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source) at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source) at java.sql.DriverManager.getConnection(DriverManager.java:525) at java.sql.DriverManager.getConnection(DriverManager.java:140) at com.tc.objectserver.storage.derby.DerbyDBEnvironment.openDatabase(DerbyDBEnvironment.java:243) at com.tc.objectserver.storage.derby.DerbyDBEnvironment.open(DerbyDBEnvironment.java:192) at com.tc.objectserver.persistence.db.DBPersistorImpl.open(DBPersistorImpl.java:122) at com.tc.objectserver.persistence.db.DBPersistorImpl.<init>(DBPersistorImpl.java:60) at com.tc.objectserver.impl.DistributedObjectServer.start(DistributedObjectServer.java:612) at com.tc.server.TCServerImpl.startDSOServer(TCServerImpl.java:487) at com.tc.server.TCServerImpl.access$600(TCServerImpl.java:84) at com.tc.server.TCServerImpl$StartAction.execute(TCServerImpl.java:435) at com.tc.lang.StartupHelper.startUp(StartupHelper.java:39) at com.tc.server.TCServerImpl.startServer(TCServerImpl.java:466) at com.tc.server.TCServerImpl.start(TCServerImpl.java:243) at com.tc.server.TCServerMain.main(TCServerMain.java:30) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:592) at com.tc.lcp.LinkedJavaProcessStarter.main(LinkedJavaProcessStarter.java:34) Caused by: java.sql.SQLException: An exception was thrown during transaction abort. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) ... 25 more Caused by: ERROR XSTB0: An exception was thrown during transaction abort. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.xact.Xact.abort(Unknown Source) at org.apache.derby.impl.store.raw.xact.XactFactory.rollbackAllTransactions(Unknown Source) at org.apache.derby.impl.store.raw.log.LogToFile.recover(Unknown Source) at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source) ... 25 more Caused by: ERROR XSLA8: Cannot rollback transaction 5021, trying to compensate null operation with null at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.log.FileLogger.undo(Unknown Source) ... 47 more Caused by: ERROR XSLA1: Log Record has been sent to the stream, but it cannot be applied to the store (Object null). This may cause recovery problems also. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.log.FileLogger.logAndUndo(Unknown Source) at org.apache.derby.impl.store.raw.xact.Xact.logAndUndo(Unknown Source) ... 48 more Caused by: ERROR XSDB0: Unexpected exception on in-memory page Page(1,Container(0, 1409)) at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.data.StoredPage.storeRecordForInsert(Unknown Source) at org.apache.derby.impl.store.raw.data.StoredPage.storeRecord(Unknown Source) at org.apache.derby.impl.store.raw.data.PurgeOperation.undoMe(Unknown Source) at org.apache.derby.impl.store.raw.data.PhysicalUndoOperation.doMe(Unknown Source) ... 50 more This failure was with Derby 10.8.2.2, so it should include the previous fixes.
          Hide
          Mike Matrigali added a comment -

          It has been awhile, have you had any luck reproducing after taking fixes for DERBY-5258 and DERBY-5284?

          Show
          Mike Matrigali added a comment - It has been awhile, have you had any luck reproducing after taking fixes for DERBY-5258 and DERBY-5284 ?
          Hide
          Mike Matrigali added a comment -

          thanks for update. If you can manage to keep a copy of the db after reproducing the error I will take a look. Otherwise a standalone test that you can provide probably is best step at this point.

          Show
          Mike Matrigali added a comment - thanks for update. If you can manage to keep a copy of the db after reproducing the error I will take a look. Otherwise a standalone test that you can provide probably is best step at this point.
          Hide
          Tim Wu added a comment -

          I've updated the derby snapshot we're using to "Apache Derby - 10.8.1.5 - (1138570)" which should include the fix to DERBY-5284. However, we hit the same db corruption issue again. Unfortunately our test infrastructure issue wasn't fixed yet, so we've lost the db files again.

          Caused by: java.sql.SQLException: An exception was thrown during transaction abort.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
          at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
          ... 25 more
          Caused by: ERROR XSTB0: An exception was thrown during transaction abort.
          at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
          at org.apache.derby.impl.store.raw.xact.Xact.abort(Unknown Source)
          at org.apache.derby.impl.store.raw.xact.XactFactory.rollbackAllTransactions(Unknown Source)
          at org.apache.derby.impl.store.raw.log.LogToFile.recover(Unknown Source)
          at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
          at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
          at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
          at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
          at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
          at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
          at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source)
          at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source)
          ... 25 more
          Caused by: ERROR XSLA8: Cannot rollback transaction 3166, trying to compensate null operation with null
          at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
          at org.apache.derby.impl.store.raw.log.FileLogger.undo(Unknown Source)
          ... 47 more
          Caused by: ERROR XSLA1: Log Record has been sent to the stream, but it cannot be applied to the store (Object null). This may cause recovery problems also.
          at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
          at org.apache.derby.impl.store.raw.log.FileLogger.logAndUndo(Unknown Source)
          at org.apache.derby.impl.store.raw.xact.Xact.logAndUndo(Unknown Source)
          ... 48 more
          Caused by: ERROR XSDB0: Unexpected exception on in-memory page Page(1,Container(0, 1409))
          at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
          at org.apache.derby.impl.store.raw.data.StoredPage.storeRecordForInsert(Unknown Source)
          at org.apache.derby.impl.store.raw.data.StoredPage.storeRecord(Unknown Source)
          at org.apache.derby.impl.store.raw.data.PurgeOperation.undoMe(Unknown Source)
          at org.apache.derby.impl.store.raw.data.PhysicalUndoOperation.doMe(Unknown Source)

          As for running our tests, I'm working on getting them runnable outside our firewall. It shouldn't be a problem, there's just some dependency issues that need to be dealt with. I'll update again once that's runnable.

          Show
          Tim Wu added a comment - I've updated the derby snapshot we're using to "Apache Derby - 10.8.1.5 - (1138570)" which should include the fix to DERBY-5284 . However, we hit the same db corruption issue again. Unfortunately our test infrastructure issue wasn't fixed yet, so we've lost the db files again. Caused by: java.sql.SQLException: An exception was thrown during transaction abort. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) ... 25 more Caused by: ERROR XSTB0: An exception was thrown during transaction abort. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.xact.Xact.abort(Unknown Source) at org.apache.derby.impl.store.raw.xact.XactFactory.rollbackAllTransactions(Unknown Source) at org.apache.derby.impl.store.raw.log.LogToFile.recover(Unknown Source) at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source) ... 25 more Caused by: ERROR XSLA8: Cannot rollback transaction 3166, trying to compensate null operation with null at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.log.FileLogger.undo(Unknown Source) ... 47 more Caused by: ERROR XSLA1: Log Record has been sent to the stream, but it cannot be applied to the store (Object null). This may cause recovery problems also. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.log.FileLogger.logAndUndo(Unknown Source) at org.apache.derby.impl.store.raw.xact.Xact.logAndUndo(Unknown Source) ... 48 more Caused by: ERROR XSDB0: Unexpected exception on in-memory page Page(1,Container(0, 1409)) at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.data.StoredPage.storeRecordForInsert(Unknown Source) at org.apache.derby.impl.store.raw.data.StoredPage.storeRecord(Unknown Source) at org.apache.derby.impl.store.raw.data.PurgeOperation.undoMe(Unknown Source) at org.apache.derby.impl.store.raw.data.PhysicalUndoOperation.doMe(Unknown Source) As for running our tests, I'm working on getting them runnable outside our firewall. It shouldn't be a problem, there's just some dependency issues that need to be dealt with. I'll update again once that's runnable.
          Hide
          Mike Matrigali added a comment -

          thanks, definitely add a post once you have had a chance to run your tests for awhile on latest 10.8 branch letting us know new failures or lack of failures. At this point I have made all the changes and backporting I know about and am not going to do more work on this until more info is available. If you ever can post your tests such that others can run them, I would be able to get a machine and some time to run them.

          If you are able to run your tests for awhile with no failures, I would like to close the current open issues and assume DERBY-5294 and DERBY-5258 fixed the issue and of course we can reopen them or open new ones if you get a new failure in the future.

          Show
          Mike Matrigali added a comment - thanks, definitely add a post once you have had a chance to run your tests for awhile on latest 10.8 branch letting us know new failures or lack of failures. At this point I have made all the changes and backporting I know about and am not going to do more work on this until more info is available. If you ever can post your tests such that others can run them, I would be able to get a machine and some time to run them. If you are able to run your tests for awhile with no failures, I would like to close the current open issues and assume DERBY-5294 and DERBY-5258 fixed the issue and of course we can reopen them or open new ones if you get a new failure in the future.
          Hide
          Tim Wu added a comment -

          No, this particular run didn't have the fix yet. I'll update the snapshot we're using and see how thing work from there.

          Show
          Tim Wu added a comment - No, this particular run didn't have the fix yet. I'll update the snapshot we're using and see how thing work from there.
          Hide
          Mike Matrigali added a comment -

          Tim, have you picked up the latest version of the 10.8 branch which will also include change #1138892, the fix for DERBY-5294?

          Show
          Mike Matrigali added a comment - Tim, have you picked up the latest version of the 10.8 branch which will also include change #1138892, the fix for DERBY-5294 ?
          Hide
          Tim Wu added a comment -

          Just wanted to report in that we hit what appears to be the same exact failure (the stack trace looks identical save for specifics like the transaction id) in our tests again today. Though unfortunately due to some mishap with our testing infrastructure the db and all logs were lost. I'll report back with another crashed db if we see it, in case it proves useful.

          Show
          Tim Wu added a comment - Just wanted to report in that we hit what appears to be the same exact failure (the stack trace looks identical save for specifics like the transaction id) in our tests again today. Though unfortunately due to some mishap with our testing infrastructure the db and all logs were lost. I'll report back with another crashed db if we see it, in case it proves useful.
          Hide
          Mike Matrigali added a comment -

          I found another code path that could be causing the problem. I am working on a fix to btree split code. Hopefully will have a patch, and tests
          run by early next week. I searched all calls to purgeAtSlot and this other code path seems like the only other problem area. I will likely
          file another JIRA for the code inspection fix and link it to this issue, as I still can't repro this for myself.

          Show
          Mike Matrigali added a comment - I found another code path that could be causing the problem. I am working on a fix to btree split code. Hopefully will have a patch, and tests run by early next week. I searched all calls to purgeAtSlot and this other code path seems like the only other problem area. I will likely file another JIRA for the code inspection fix and link it to this issue, as I still can't repro this for myself.
          Hide
          Mike Matrigali added a comment -

          Yes, the M indicates modifications in the source tree, so seems right for a one off build with personal changes. If you are just building a straight 10.8 with none of your
          changes I would suggest you just have a clean 10.8 branch and do a svn update to the current level. And build that. Then the
          build number should reflect that it has the change in it.

          I am at a loss on how this bug can occur after my fix. If you are sure you repro'd in your environment with the fix, then I can't
          think of anything else other than as rick suggested. Post your program so that others can run it and see if we can repro. If I can
          repro it I am sure I can fix it. So far I have been just guessing as I said. I am pretty sure my fix was correct, but I guess there
          could be some other code path I missed. I will think about it a little more, and look a little more at this db.

          The developers on this list have access to some big stress testing machines, so if we had a test case we could increase the
          running of it significantly.

          Show
          Mike Matrigali added a comment - Yes, the M indicates modifications in the source tree, so seems right for a one off build with personal changes. If you are just building a straight 10.8 with none of your changes I would suggest you just have a clean 10.8 branch and do a svn update to the current level. And build that. Then the build number should reflect that it has the change in it. I am at a loss on how this bug can occur after my fix. If you are sure you repro'd in your environment with the fix, then I can't think of anything else other than as rick suggested. Post your program so that others can run it and see if we can repro. If I can repro it I am sure I can fix it. So far I have been just guessing as I said. I am pretty sure my fix was correct, but I guess there could be some other code path I missed. I will think about it a little more, and look a little more at this db. The developers on this list have access to some big stress testing machines, so if we had a test case we could increase the running of it significantly.
          Hide
          Tim Wu added a comment -

          I find this in the startup message:

          Booting Derby version The Apache Software Foundation - Apache Derby - 10.8.1.2 - (1095077M)

          Looks like the 1095077 part is the revision number (the revision that 10.8.1.2 is based off of). The only change I have in the source tree I built off over the base 10.8.1.2 revision is the DERBY-5258 fix.

          I'm guessing the "M" indicates modifications in the source tree?

          Show
          Tim Wu added a comment - I find this in the startup message: Booting Derby version The Apache Software Foundation - Apache Derby - 10.8.1.2 - (1095077M) Looks like the 1095077 part is the revision number (the revision that 10.8.1.2 is based off of). The only change I have in the source tree I built off over the base 10.8.1.2 revision is the DERBY-5258 fix. I'm guessing the "M" indicates modifications in the source tree?
          Hide
          Mike Matrigali added a comment -

          Below is a small exerpt from fulllogrecorddump.log.
          It shows that:
          transaction 2989 does a number of purges to page 1 of container 1409.
          transaction 2993 does a subsequent insert to same page with no commit/abort from 2989
          transaction 2992 does a subsequent insert and commit to same page with no commit/abort from 2989

          This is the scenario that the fix for DERBY-5258 tried to address.

          ... lots of log records before this
          DEBUG LogTrace OUTPUT: scanned 2989 : Page Operation: Page(1,Container(0, 1409)) pageVersion 5753 : Purge : 1 slots starting at 2 (recordId=665) insta
          nt = (1,3364851) logEnd = (1,3364912) logIn at 22 available 23^M
          DEBUG LogTrace OUTPUT: scanned 2989 : Page Operation: Page(1,Container(0, 1409)) pageVersion 5754 : Purge : 1 slots starting at 1 (recordId=664) insta
          nt = (1,3364912) logEnd = (1,3364973) logIn at 22 available 23^M
          DEBUG LogTrace OUTPUT: scanned 2993 : Page Operation: Page(1,Container(0, 1409)) pageVersion 5755 : Insert : Slot=336 recordId=2149 instant = (1,3364
          973) logEnd = (1,3365040) logIn at 25 available 26^M
          DEBUG LogTrace OUTPUT: scanned 2992 : Page Operation: Page(1,Container(0, 1409)) pageVersion 5756 : Insert : Slot=336 recordId=2150 instant = (1,3365
          040) logEnd = (1,3365107) logIn at 25 available 26^M
          DEBUG LogTrace OUTPUT: scanned 2992 : Page Operation: Page(13,Container(0, 1168)) pageVersion 650 : Update Slot=11 recordId=17 instant = (1,3365107)
          logEnd = (1,3365276) logIn at 20 available 133^M
          DEBUG LogTrace OUTPUT: scanned 2992 : Page Operation: Page(13,Container(0, 1168)) pageVersion 651 : Update Slot=14 recordId=20 instant = (1,3365276)
          logEnd = (1,3365836) logIn at 20 available 524^M
          DEBUG LogTrace OUTPUT: scanned 2992 : EndXact null Committed : transactionStatus = Committed instant = (1,3365836) logEnd = (1,3365869) logIn at 13 a
          vailable 4^M
          ... this is last log record, thus where the log on disk was when the crash happened.

          From the sane attempt to boot we get the following. The internal checks notice after the insert that a problem exists on the page.
          The problem is that allowing inserts onto the page after the purges are committed or aborted may lead to page corruption. The inserts
          take up space on the page that abort of the purges need. When the purges are undone, they can overwrite other rows or row headers leading
          to a number of different boot problems that may each create a different error when booting for the same underlying cause.

          Show
          Mike Matrigali added a comment - Below is a small exerpt from fulllogrecorddump.log. It shows that: transaction 2989 does a number of purges to page 1 of container 1409. transaction 2993 does a subsequent insert to same page with no commit/abort from 2989 transaction 2992 does a subsequent insert and commit to same page with no commit/abort from 2989 This is the scenario that the fix for DERBY-5258 tried to address. ... lots of log records before this DEBUG LogTrace OUTPUT: scanned 2989 : Page Operation: Page(1,Container(0, 1409)) pageVersion 5753 : Purge : 1 slots starting at 2 (recordId=665) insta nt = (1,3364851) logEnd = (1,3364912) logIn at 22 available 23^M DEBUG LogTrace OUTPUT: scanned 2989 : Page Operation: Page(1,Container(0, 1409)) pageVersion 5754 : Purge : 1 slots starting at 1 (recordId=664) insta nt = (1,3364912) logEnd = (1,3364973) logIn at 22 available 23^M DEBUG LogTrace OUTPUT: scanned 2993 : Page Operation: Page(1,Container(0, 1409)) pageVersion 5755 : Insert : Slot=336 recordId=2149 instant = (1,3364 973) logEnd = (1,3365040) logIn at 25 available 26^M DEBUG LogTrace OUTPUT: scanned 2992 : Page Operation: Page(1,Container(0, 1409)) pageVersion 5756 : Insert : Slot=336 recordId=2150 instant = (1,3365 040) logEnd = (1,3365107) logIn at 25 available 26^M DEBUG LogTrace OUTPUT: scanned 2992 : Page Operation: Page(13,Container(0, 1168)) pageVersion 650 : Update Slot=11 recordId=17 instant = (1,3365107) logEnd = (1,3365276) logIn at 20 available 133^M DEBUG LogTrace OUTPUT: scanned 2992 : Page Operation: Page(13,Container(0, 1168)) pageVersion 651 : Update Slot=14 recordId=20 instant = (1,3365276) logEnd = (1,3365836) logIn at 20 available 524^M DEBUG LogTrace OUTPUT: scanned 2992 : EndXact null Committed : transactionStatus = Committed instant = (1,3365836) logEnd = (1,3365869) logIn at 13 a vailable 4^M ... this is last log record, thus where the log on disk was when the crash happened. From the sane attempt to boot we get the following. The internal checks notice after the insert that a problem exists on the page. The problem is that allowing inserts onto the page after the purges are committed or aborted may lead to page corruption. The inserts take up space on the page that abort of the purges need. When the purges are undone, they can overwrite other rows or row headers leading to a number of different boot problems that may each create a different error when booting for the same underlying cause.
          Hide
          Mike Matrigali added a comment - - edited

          I will post more details soon, but initial debugging makes this look like same problem as DERBY-5258. Is there
          any chance when you ran your test either you were using an old 10.8 without the fix, or maybe your build did not
          pick up the fix. Do you have a complete derby.log from when you ran the test, the initial startup message should
          have build information that may be helpful.

          Note that once the problem is created by a crash in a server without the fix, it will not be fixed by subsequently trying
          to boot with a server with the fix. The fix intended to make the situation not happen, but does not fix it after it has
          happened.

          Show
          Mike Matrigali added a comment - - edited I will post more details soon, but initial debugging makes this look like same problem as DERBY-5258 . Is there any chance when you ran your test either you were using an old 10.8 without the fix, or maybe your build did not pick up the fix. Do you have a complete derby.log from when you ran the test, the initial startup message should have build information that may be helpful. Note that once the problem is created by a crash in a server without the fix, it will not be fixed by subsequently trying to boot with a server with the fix. The fix intended to make the situation not happen, but does not fix it after it has happened.
          Hide
          Mike Matrigali added a comment -

          attaching runtimelogrecorddump.jar which is a jar'd version of derby.log containing the log records actually processed during boot using a SANE server with the following property enabled. This can be used with the other jar to get a
          complete picture of every update that happened in this db
          because the log only has log1.dat which means all log
          records since db was created are included.

          derby.debug.true=LogTrace

          Show
          Mike Matrigali added a comment - attaching runtimelogrecorddump.jar which is a jar'd version of derby.log containing the log records actually processed during boot using a SANE server with the following property enabled. This can be used with the other jar to get a complete picture of every update that happened in this db because the log only has log1.dat which means all log records since db was created are included. derby.debug.true=LogTrace
          Hide
          Mike Matrigali added a comment -

          attaching a jar'd derby.log that contains a complete listing
          of the log records of the objectdb.zip database. This was obtained by fixing up the log directory location in service.properties, adding the following property to derby.properties, and then connecting. This property has
          the system just scan through all the log records without
          actually applying them and then writing debug info about them into the derby.log. It can only be used with a SANE
          build.

          derby.debug.true=DumpLogOnly,LogTrace

          Show
          Mike Matrigali added a comment - attaching a jar'd derby.log that contains a complete listing of the log records of the objectdb.zip database. This was obtained by fixing up the log directory location in service.properties, adding the following property to derby.properties, and then connecting. This property has the system just scan through all the log records without actually applying them and then writing debug info about them into the derby.log. It can only be used with a SANE build. derby.debug.true=DumpLogOnly,LogTrace
          Hide
          Mike Matrigali added a comment -

          could you comment on how easy this was to reproduce (ie. it happened once out N times, or it happened many times ...).
          If you got it to happen more than once, please post at least one other occurence, as it is often helpful to compare what is
          same/different between 2 reproductions.

          Show
          Mike Matrigali added a comment - could you comment on how easy this was to reproduce (ie. it happened once out N times, or it happened many times ...). If you got it to happen more than once, please post at least one other occurence, as it is often helpful to compare what is same/different between 2 reproductions.
          Hide
          Rick Hillegas added a comment -

          Thanks for pushing this forward, Tim. It would be helpful if you could attach a program which performs the selects/updates/inserts that trigger the corruption--that could save us some time tracking this down. Thanks.

          Show
          Rick Hillegas added a comment - Thanks for pushing this forward, Tim. It would be helpful if you could attach a program which performs the selects/updates/inserts that trigger the corruption--that could save us some time tracking this down. Thanks.
          Hide
          Tim Wu added a comment -

          Please note the database log directory will need to be changed to match.

          Show
          Tim Wu added a comment - Please note the database log directory will need to be changed to match.

            People

            • Assignee:
              Mike Matrigali
              Reporter:
              Tim Wu
            • Votes:
              3 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development