Derby
  1. Derby
  2. DERBY-5448

In the 10.8 branch, back out the concurrency improvements introduced by using SequenceUpdaters to allocate identity values.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.8.2.2
    • Fix Version/s: 10.8.2.2
    • Component/s: Documentation, SQL
    • Labels:
      None

      Description

      New errors were seen while running NsTest against the 10.8.2 release candidates. These errors are recorded as DERBY-5430 and DERBY-5422. The first error is a deadlock which occurs when inserting into the main table of the test. The second error is a problem clearing the identity cache. These errors have caused people to lose confidence in the concurrency improvements introduced by DERBY-4437. We should back the DERBY-4437 changes out of the 10.8 branch and use the trunk to continue debugging the problems disclosed by NsTest.

        Issue Links

          Activity

          Hide
          Rick Hillegas added a comment -

          Attaching derby-5448-01-aa-backoutConcurrencyChanges.diff. Regression tests ran cleanly for me against this patch. This patch backs out the following ports to the 10.8 branch and reverts 10.8 to the old style of identity generation:

          DERBY-4437 1141645: This is the master commit which ported most of the 10.9 changes to 10.8

          DERBY-4437 1142052: This commit ported an upgrade test from 10.9 to 10.8. The test verifies the new identity behavior.

          DERBY-5408 1170178: This commit ported the localization fixes for the 2200H message from 10.9 to the 10.8 branch.

          DERBY-5426 1174297: This commit ported some SequenceUpdater changes from 10.9 to the 10.8 branch. The changes improved the error reporting when there was too much contention on an identity column.

          -----------------------------

          I still see deadlocks in NsTest after applying this patch. However, the deadlocks do not involve identity generation as far as I can tell. I don't know enough about NsTest to say whether these deadlocks warrant additional investigation before building a new 10.8.2 release candidate. Here is the deadlock cycle I see now:

          java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
          Lock : ROW, NSTESTTAB, (605,25)
          Waiting XID :

          {98562, S}

          , NSTEST, select max(serialkey) from nstesttab where serialkey > ?
          Granted XID :

          {98556, X}


          Lock : ROW, NSTESTTAB, (605,29)
          Waiting XID :

          {98556, S}

          , NSTEST, select max(serialkey) from nstesttab where serialkey > ?
          Lock : ROW, NSTESTTAB, (605,29)
          Waiting XID :

          {98530, X}

          , NSTEST, update nstesttab set t_char = ? where serialkey = 36733
          Lock : ROW, NSTESTTAB, (605,29)
          Waiting XID :

          {98571, X}

          , NSTEST, update nstesttab set t_double = ? where serialkey = 36733
          Lock : ROW, NSTESTTAB, (605,29)
          Waiting XID :

          {98536, X}

          , NSTEST, update nstesttab set t_timestamp = ? where serialkey = 36733
          Lock : ROW, NSTESTTAB, (605,29)
          Waiting XID :

          {98509, X}

          , NSTEST, delete from nstesttab where serialkey = ?
          Lock : ROW, NSTESTTAB, (605,29)
          Waiting XID :

          {98533, X}

          , NSTEST, delete from nstesttab where serialkey = ?
          Granted XID :

          {98516, X}


          Lock : ROW, NSTESTTAB, (605,25)
          Waiting XID :

          {98516, S}

          , NSTEST, select max(serialkey) from nstesttab where serialkey > ?
          Lock : ROW, NSTESTTAB, (605,25)
          Waiting XID :

          {98576, X}

          , NSTEST, update nstesttab set t_longint = ? where serialkey = 36729
          . The selected victim is XID : 98562.

          -----------------------------

          Touches the following files:

          M java/storeless/org/apache/derby/impl/storeless/EmptyDictionary.java
          M java/engine/org/apache/derby/impl/sql/compile/CreateSequenceNode.java
          M java/engine/org/apache/derby/impl/sql/compile/NextSequenceNode.java
          M java/engine/org/apache/derby/impl/sql/execute/InsertResultSet.java
          M java/engine/org/apache/derby/impl/sql/execute/BaseActivation.java
          M java/engine/org/apache/derby/impl/sql/execute/InsertConstantAction.java
          M java/engine/org/apache/derby/impl/sql/catalog/SequenceGenerator.java
          M java/engine/org/apache/derby/impl/sql/catalog/DataDictionaryImpl.java
          D java/engine/org/apache/derby/impl/sql/catalog/SequenceRange.java
          M java/engine/org/apache/derby/impl/sql/catalog/SequenceUpdater.java
          M java/engine/org/apache/derby/impl/db/BasicDatabase.java
          M java/engine/org/apache/derby/iapi/sql/dictionary/DataDictionary.java
          M java/engine/org/apache/derby/iapi/sql/dictionary/SequenceDescriptor.java
          M java/engine/org/apache/derby/iapi/reference/Property.java
          D java/engine/org/apache/derby/catalog/SequencePreallocator.java
          M java/engine/org/apache/derby/loc/messages_de_DE.properties
          M java/engine/org/apache/derby/loc/messages_ko_KR.properties
          M java/engine/org/apache/derby/loc/messages_fr.properties
          M java/engine/org/apache/derby/loc/messages_es.properties
          M java/engine/org/apache/derby/loc/messages_it.properties
          M java/engine/org/apache/derby/loc/messages_ja_JP.properties
          M java/engine/org/apache/derby/loc/messages.xml
          M java/engine/org/apache/derby/loc/messages_zh_TW.properties
          M java/shared/org/apache/derby/shared/common/reference/SQLState.java
          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/AlterTableTest.java
          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/AutoIncrementTest.java
          D java/testing/org/apache/derbyTesting/functionTests/tests/lang/t_4437_2.dat
          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/SequenceGeneratorTest.java
          M java/testing/org/apache/derbyTesting/functionTests/tests/upgradeTests/UpgradeRun.java
          D java/testing/org/apache/derbyTesting/functionTests/tests/upgradeTests/Changes10_8_2.java
          M java/testing/org/apache/derbyTesting/functionTests/tests/upgradeTests/UpgradeChange.java
          M tools/javadoc/publishedapi.ant

          Show
          Rick Hillegas added a comment - Attaching derby-5448-01-aa-backoutConcurrencyChanges.diff. Regression tests ran cleanly for me against this patch. This patch backs out the following ports to the 10.8 branch and reverts 10.8 to the old style of identity generation: DERBY-4437 1141645: This is the master commit which ported most of the 10.9 changes to 10.8 DERBY-4437 1142052: This commit ported an upgrade test from 10.9 to 10.8. The test verifies the new identity behavior. DERBY-5408 1170178: This commit ported the localization fixes for the 2200H message from 10.9 to the 10.8 branch. DERBY-5426 1174297: This commit ported some SequenceUpdater changes from 10.9 to the 10.8 branch. The changes improved the error reporting when there was too much contention on an identity column. ----------------------------- I still see deadlocks in NsTest after applying this patch. However, the deadlocks do not involve identity generation as far as I can tell. I don't know enough about NsTest to say whether these deadlocks warrant additional investigation before building a new 10.8.2 release candidate. Here is the deadlock cycle I see now: java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : ROW, NSTESTTAB, (605,25) Waiting XID : {98562, S} , NSTEST, select max(serialkey) from nstesttab where serialkey > ? Granted XID : {98556, X} Lock : ROW, NSTESTTAB, (605,29) Waiting XID : {98556, S} , NSTEST, select max(serialkey) from nstesttab where serialkey > ? Lock : ROW, NSTESTTAB, (605,29) Waiting XID : {98530, X} , NSTEST, update nstesttab set t_char = ? where serialkey = 36733 Lock : ROW, NSTESTTAB, (605,29) Waiting XID : {98571, X} , NSTEST, update nstesttab set t_double = ? where serialkey = 36733 Lock : ROW, NSTESTTAB, (605,29) Waiting XID : {98536, X} , NSTEST, update nstesttab set t_timestamp = ? where serialkey = 36733 Lock : ROW, NSTESTTAB, (605,29) Waiting XID : {98509, X} , NSTEST, delete from nstesttab where serialkey = ? Lock : ROW, NSTESTTAB, (605,29) Waiting XID : {98533, X} , NSTEST, delete from nstesttab where serialkey = ? Granted XID : {98516, X} Lock : ROW, NSTESTTAB, (605,25) Waiting XID : {98516, S} , NSTEST, select max(serialkey) from nstesttab where serialkey > ? Lock : ROW, NSTESTTAB, (605,25) Waiting XID : {98576, X} , NSTEST, update nstesttab set t_longint = ? where serialkey = 36729 . The selected victim is XID : 98562. ----------------------------- Touches the following files: M java/storeless/org/apache/derby/impl/storeless/EmptyDictionary.java M java/engine/org/apache/derby/impl/sql/compile/CreateSequenceNode.java M java/engine/org/apache/derby/impl/sql/compile/NextSequenceNode.java M java/engine/org/apache/derby/impl/sql/execute/InsertResultSet.java M java/engine/org/apache/derby/impl/sql/execute/BaseActivation.java M java/engine/org/apache/derby/impl/sql/execute/InsertConstantAction.java M java/engine/org/apache/derby/impl/sql/catalog/SequenceGenerator.java M java/engine/org/apache/derby/impl/sql/catalog/DataDictionaryImpl.java D java/engine/org/apache/derby/impl/sql/catalog/SequenceRange.java M java/engine/org/apache/derby/impl/sql/catalog/SequenceUpdater.java M java/engine/org/apache/derby/impl/db/BasicDatabase.java M java/engine/org/apache/derby/iapi/sql/dictionary/DataDictionary.java M java/engine/org/apache/derby/iapi/sql/dictionary/SequenceDescriptor.java M java/engine/org/apache/derby/iapi/reference/Property.java D java/engine/org/apache/derby/catalog/SequencePreallocator.java M java/engine/org/apache/derby/loc/messages_de_DE.properties M java/engine/org/apache/derby/loc/messages_ko_KR.properties M java/engine/org/apache/derby/loc/messages_fr.properties M java/engine/org/apache/derby/loc/messages_es.properties M java/engine/org/apache/derby/loc/messages_it.properties M java/engine/org/apache/derby/loc/messages_ja_JP.properties M java/engine/org/apache/derby/loc/messages.xml M java/engine/org/apache/derby/loc/messages_zh_TW.properties M java/shared/org/apache/derby/shared/common/reference/SQLState.java M java/testing/org/apache/derbyTesting/functionTests/tests/lang/AlterTableTest.java M java/testing/org/apache/derbyTesting/functionTests/tests/lang/AutoIncrementTest.java D java/testing/org/apache/derbyTesting/functionTests/tests/lang/t_4437_2.dat M java/testing/org/apache/derbyTesting/functionTests/tests/lang/SequenceGeneratorTest.java M java/testing/org/apache/derbyTesting/functionTests/tests/upgradeTests/UpgradeRun.java D java/testing/org/apache/derbyTesting/functionTests/tests/upgradeTests/Changes10_8_2.java M java/testing/org/apache/derbyTesting/functionTests/tests/upgradeTests/UpgradeChange.java M tools/javadoc/publishedapi.ant
          Hide
          Rick Hillegas added a comment -

          Committed derby-5448-01-aa-backoutConcurrencyChanges.diff to 10.8 branch at subversion revision 1179717.

          Show
          Rick Hillegas added a comment - Committed derby-5448-01-aa-backoutConcurrencyChanges.diff to 10.8 branch at subversion revision 1179717.
          Hide
          Myrna van Lunteren added a comment -

          How disappointing. I'll apply your patch and see if I can see those deadlocks also.

          Show
          Myrna van Lunteren added a comment - How disappointing. I'll apply your patch and see if I can see those deadlocks also.
          Hide
          Myrna van Lunteren added a comment -

          I see you committed. Thanks for doing the backing out Rick. I'll try nstest with the branch sync-ed up and will report back...

          Show
          Myrna van Lunteren added a comment - I see you committed. Thanks for doing the backing out Rick. I'll try nstest with the branch sync-ed up and will report back...
          Hide
          Rick Hillegas added a comment - - edited

          Attaching derby-5448-02-aa-backoutDocsChanges.diff. This patch removes the 10.8 documentation of the derby.language.sequence.preallocator property. Committed to the 10.8 docs branch at subversion revision 1179759.

          This patch backs out the following commit:

          DERBY-5307 1141651: This commit ported documentation of the derby.language.sequence.preallocator property from 10.9 docs to 10.8 docs. Note that the property still has meaning for sequences although the property would be less capable after backing out 1141645.

          Touches the following files:

          M src/ref/rrefsqlj37836.dita
          M src/ref/rrefsqljcreatesequence.dita
          D src/ref/rrefproperpreallocator.dita
          M src/ref/crefproper22250.dita
          M src/ref/refderby.ditamap

          Show
          Rick Hillegas added a comment - - edited Attaching derby-5448-02-aa-backoutDocsChanges.diff. This patch removes the 10.8 documentation of the derby.language.sequence.preallocator property. Committed to the 10.8 docs branch at subversion revision 1179759. This patch backs out the following commit: DERBY-5307 1141651: This commit ported documentation of the derby.language.sequence.preallocator property from 10.9 docs to 10.8 docs. Note that the property still has meaning for sequences although the property would be less capable after backing out 1141645. Touches the following files: M src/ref/rrefsqlj37836.dita M src/ref/rrefsqljcreatesequence.dita D src/ref/rrefproperpreallocator.dita M src/ref/crefproper22250.dita M src/ref/refderby.ditamap
          Hide
          Myrna van Lunteren added a comment -

          One more thing I thought to mention: I've not seen deadlocks in this test since I started running them (for QA cycle for 10.5, I think), but comments in the test lead me to believe they may have been possible.
          I'm referring to this section in the Tester1 class:
          // Too many deadlocks amd locking issues if this is not commented out
          try

          { connex.setAutoCommit(false); }

          catch (Exception e)

          { System.out.println("FAIL: " + getThread_id() + "'s setAutoCommit() failed:"); printException("setting AutoCommit", e); }

          // also set isolation level to Connection.TRANSACTION_READ_UNCOMMITTED
          // to reduce number of
          // deadlocks
          setIsolationLevel(Connection.TRANSACTION_READ_UNCOMMITTED);
          So perhaps a different hardware/jvm combination is more/less likely to run into deadlock situations...?

          Show
          Myrna van Lunteren added a comment - One more thing I thought to mention: I've not seen deadlocks in this test since I started running them (for QA cycle for 10.5, I think), but comments in the test lead me to believe they may have been possible. I'm referring to this section in the Tester1 class: // Too many deadlocks amd locking issues if this is not commented out try { connex.setAutoCommit(false); } catch (Exception e) { System.out.println("FAIL: " + getThread_id() + "'s setAutoCommit() failed:"); printException("setting AutoCommit", e); } // also set isolation level to Connection.TRANSACTION_READ_UNCOMMITTED // to reduce number of // deadlocks setIsolationLevel(Connection.TRANSACTION_READ_UNCOMMITTED); So perhaps a different hardware/jvm combination is more/less likely to run into deadlock situations...?
          Hide
          Mike Matrigali added a comment -

          I agree that the deadlocks do not seem identity related given the trace. It might shed light if anyone can reproduce with full lock tracing on and get a full lock table dump.

          it looks like muliple threads are doing "max's" while in a multi-statement transaction and holding previous X row locks on the table leading
          to classic user level deadlock.

          There was a change to MAX in 10.8 ( DERBY-642), but I do not know if these queries go through that code path. That would not explain
          it if nstest is getting deadlock in 10.8 RC, but not in the initial 10.8 release.

          Show
          Mike Matrigali added a comment - I agree that the deadlocks do not seem identity related given the trace. It might shed light if anyone can reproduce with full lock tracing on and get a full lock table dump. it looks like muliple threads are doing "max's" while in a multi-statement transaction and holding previous X row locks on the table leading to classic user level deadlock. There was a change to MAX in 10.8 ( DERBY-642 ), but I do not know if these queries go through that code path. That would not explain it if nstest is getting deadlock in 10.8 RC, but not in the initial 10.8 release.
          Hide
          Myrna van Lunteren added a comment -

          DERBY-642 is marked as fixed in 10.8.1.2 - so the initial 10.8 release...

          Show
          Myrna van Lunteren added a comment - DERBY-642 is marked as fixed in 10.8.1.2 - so the initial 10.8 release...
          Hide
          Mike Matrigali added a comment -

          That deadlock error message format is wierd to me. Did it change? I am used to seeing all grant and wait for each lock like this example from the documentation:
          ERROR 40001: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
          Lock : ROW, T, (1,8)
          Waiting XID :

          {235, S}

          , APP, select * from t where id = 2
          Granted XID :

          {238, X}

          Lock : ROW, T, (1,7)
          Waiting XID :

          {238, S}

          , APP, select * from t where id = 1
          Granted XID :

          {235, X}

          . The selected victim is XID : 235.

          Show
          Mike Matrigali added a comment - That deadlock error message format is wierd to me. Did it change? I am used to seeing all grant and wait for each lock like this example from the documentation: ERROR 40001: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : ROW, T, (1,8) Waiting XID : {235, S} , APP, select * from t where id = 2 Granted XID : {238, X} Lock : ROW, T, (1,7) Waiting XID : {238, S} , APP, select * from t where id = 1 Granted XID : {235, X} . The selected victim is XID : 235.
          Hide
          Rick Hillegas added a comment -

          I have re-run NsTest on the sane jars on 10.8 (after checking in derby-5448-01-aa-backoutConcurrencyChanges.diff. I set the following switches on the command line but I don't see any additional information in derby.log. Mike, what switches should I set to get the additional information you need from the lock table?

          -Dderby.stream.error.extendedDiagSeverityLevel=30000 \
          -Dderby.locks.monitor=true \
          -Dderby.locks.deadlockTrace=true \

          Unfortunately, this time around I saw a deadlock which involved the identity column (i.e., the SYSCOLUMNS row). I am including the deadlock trace and the stack trace. The deadlock is found while trying to allocate the next identity value. You can see from the stack trace that this is the old identity generation logic which Derby has been using for the past several years. At this point it may be useful to do the following:

          1) come up with a repro which pops the deadlock faster

          2) perform a binary search on the 10.8 branch, looking for the checkin which introduced this deadlock

          ERROR 40001: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
          Lock : ROW, SYSCOLUMNS, (5,16)
          Waiting XID :

          {98270, X}

          , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K)))
          Granted XID :

          {98178, X}


          Lock : ROW, NSTESTTAB, (603,39)
          Waiting XID :

          {98178, S}

          , NSTEST, select max(serialkey) from nstesttab where serialkey > ?
          Granted XID :

          {98245, X}
          Lock : ROW, SYSCOLUMNS, (5,16)
          Waiting XID : {98245, X}

          , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K)))
          . The selected victim is XID : 98270.
          at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:303)
          at org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:554)
          at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:629)
          at org.apache.derby.impl.services.locks.AbstractPool.lockObject(AbstractPool.java:119)
          at org.apache.derby.impl.services.locks.ConcurrentPool.lockObject(ConcurrentPool.java:28)
          at org.apache.derby.impl.store.raw.xact.RowLocking3.lockRecordForWrite(RowLocking3.java:248)
          at org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.lockPositionForWrite(OpenConglomerate.java:587)
          at org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(GenericConglomerateController.java:382)
          at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getSetAutoincrementValue(DataDictionaryImpl.java:8676)
          at org.apache.derby.impl.sql.execute.InsertResultSet.getSetAutoincrementValue(InsertResultSet.java:830)
          at org.apache.derby.impl.sql.execute.BaseActivation.getSetAutoincrementValue(BaseActivation.java:666)
          at org.apache.derby.exe.ac88e70ddcx0132xde76x081bx0000045c93288.e0(Unknown Source)
          at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGeneratedClass.java:139)
          at org.apache.derby.impl.sql.execute.RowResultSet.getNextRowCore(RowResultSet.java:148)
          at org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(NormalizeResultSet.java:185)
          at org.apache.derby.impl.sql.execute.DMLWriteResultSet.getNextRowCore(DMLWriteResultSet.java:127)
          at org.apache.derby.impl.sql.execute.InsertResultSet.open(InsertResultSet.java:504)
          at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:436)
          at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:317)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(EmbedPreparedStatement.java:308)
          at org.apache.derbyTesting.system.nstest.utils.DbUtil.add_one_row(DbUtil.java:201)
          at org.apache.derbyTesting.system.nstest.tester.TesterObject.doIUDOperation(TesterObject.java:148)
          at org.apache.derbyTesting.system.nstest.tester.Tester2.startTesting(Tester2.java:109)
          at org.apache.derbyTesting.system.nstest.NsTest.run(NsTest.java:555)

          Show
          Rick Hillegas added a comment - I have re-run NsTest on the sane jars on 10.8 (after checking in derby-5448-01-aa-backoutConcurrencyChanges.diff. I set the following switches on the command line but I don't see any additional information in derby.log. Mike, what switches should I set to get the additional information you need from the lock table? -Dderby.stream.error.extendedDiagSeverityLevel=30000 \ -Dderby.locks.monitor=true \ -Dderby.locks.deadlockTrace=true \ Unfortunately, this time around I saw a deadlock which involved the identity column (i.e., the SYSCOLUMNS row). I am including the deadlock trace and the stack trace. The deadlock is found while trying to allocate the next identity value. You can see from the stack trace that this is the old identity generation logic which Derby has been using for the past several years. At this point it may be useful to do the following: 1) come up with a repro which pops the deadlock faster 2) perform a binary search on the 10.8 branch, looking for the checkin which introduced this deadlock ERROR 40001: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : ROW, SYSCOLUMNS, (5,16) Waiting XID : {98270, X} , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K))) Granted XID : {98178, X} Lock : ROW, NSTESTTAB, (603,39) Waiting XID : {98178, S} , NSTEST, select max(serialkey) from nstesttab where serialkey > ? Granted XID : {98245, X} Lock : ROW, SYSCOLUMNS, (5,16) Waiting XID : {98245, X} , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K))) . The selected victim is XID : 98270. at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:303) at org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:554) at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:629) at org.apache.derby.impl.services.locks.AbstractPool.lockObject(AbstractPool.java:119) at org.apache.derby.impl.services.locks.ConcurrentPool.lockObject(ConcurrentPool.java:28) at org.apache.derby.impl.store.raw.xact.RowLocking3.lockRecordForWrite(RowLocking3.java:248) at org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.lockPositionForWrite(OpenConglomerate.java:587) at org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(GenericConglomerateController.java:382) at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getSetAutoincrementValue(DataDictionaryImpl.java:8676) at org.apache.derby.impl.sql.execute.InsertResultSet.getSetAutoincrementValue(InsertResultSet.java:830) at org.apache.derby.impl.sql.execute.BaseActivation.getSetAutoincrementValue(BaseActivation.java:666) at org.apache.derby.exe.ac88e70ddcx0132xde76x081bx0000045c93288.e0(Unknown Source) at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGeneratedClass.java:139) at org.apache.derby.impl.sql.execute.RowResultSet.getNextRowCore(RowResultSet.java:148) at org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(NormalizeResultSet.java:185) at org.apache.derby.impl.sql.execute.DMLWriteResultSet.getNextRowCore(DMLWriteResultSet.java:127) at org.apache.derby.impl.sql.execute.InsertResultSet.open(InsertResultSet.java:504) at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:436) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:317) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(EmbedPreparedStatement.java:308) at org.apache.derbyTesting.system.nstest.utils.DbUtil.add_one_row(DbUtil.java:201) at org.apache.derbyTesting.system.nstest.tester.TesterObject.doIUDOperation(TesterObject.java:148) at org.apache.derbyTesting.system.nstest.tester.Tester2.startTesting(Tester2.java:109) at org.apache.derbyTesting.system.nstest.NsTest.run(NsTest.java:555)
          Hide
          Mike Matrigali added a comment -

          rick, those are the switches I was think of. It looks like lock output has changed since I last debugged these kinds of problems, I will look at code and see if anything else is possible.

          You don't mention how long it takes for you to repro. Since you seem to have the hardware/jvm combination that repro's could you run a test against the existing 10.8 apache release (not the RC but the actual release), just to verify this is a regression in 10.8. I believe these kinds of
          issues are very timing dependent and hardware/JVM can affect the runs greatly.

          Also anyone running nstest on top branch can you mention if you are seeing any of the other issues that we were hoping were going to be fixed
          by backing out 5448 (like DERBY-5422 and DERBY-5428 and maybe others).

          Show
          Mike Matrigali added a comment - rick, those are the switches I was think of. It looks like lock output has changed since I last debugged these kinds of problems, I will look at code and see if anything else is possible. You don't mention how long it takes for you to repro. Since you seem to have the hardware/jvm combination that repro's could you run a test against the existing 10.8 apache release (not the RC but the actual release), just to verify this is a regression in 10.8. I believe these kinds of issues are very timing dependent and hardware/JVM can affect the runs greatly. Also anyone running nstest on top branch can you mention if you are seeing any of the other issues that we were hoping were going to be fixed by backing out 5448 (like DERBY-5422 and DERBY-5428 and maybe others).
          Hide
          Mike Matrigali added a comment -

          I think the following option will print more info on deadlock. It will only work in a SANE server. Note that if you need to set more than one
          debug option it takes a comma separated list of options.

          derby.debug.true=DeadlockTrace

          Show
          Mike Matrigali added a comment - I think the following option will print more info on deadlock. It will only work in a SANE server. Note that if you need to set more than one debug option it takes a comma separated list of options. derby.debug.true=DeadlockTrace
          Hide
          Rick Hillegas added a comment -

          Thanks, Mike. The identity deadlock happened 5.5 minutes after starting NsTest. I will run NsTest against 10.8.1.

          Show
          Rick Hillegas added a comment - Thanks, Mike. The identity deadlock happened 5.5 minutes after starting NsTest. I will run NsTest against 10.8.1.
          Hide
          Rick Hillegas added a comment - - edited

          I have run NsTest five times against the 10.8.1.2 insane jars. Each duration lasted around 10 minutes. I did not see any deadlocks. I will repeat this experiment against the 10.8.1.2 sane jars.

          Show
          Rick Hillegas added a comment - - edited I have run NsTest five times against the 10.8.1.2 insane jars. Each duration lasted around 10 minutes. I did not see any deadlocks. I will repeat this experiment against the 10.8.1.2 sane jars.
          Hide
          Rick Hillegas added a comment -

          I have run NsTest five times against the 10.8.1.2 sane jars. Each duration lasted around 10 minutes. I did not see any deadlocks.

          Show
          Rick Hillegas added a comment - I have run NsTest five times against the 10.8.1.2 sane jars. Each duration lasted around 10 minutes. I did not see any deadlocks.
          Hide
          Mike Matrigali added a comment -

          rick, can you describe the environment of your test run. machine, number of processors, speed, OS, JVM?

          I did a scan of the svn log of all the changes done in 10.8 and nothing jumped out at me. There have a been a few fixes that changed
          sychronization a little to fix bugs and that could have changed timing of the runs, I guess. So far I have just been looking at descriptions
          not code.

          Show
          Mike Matrigali added a comment - rick, can you describe the environment of your test run. machine, number of processors, speed, OS, JVM? I did a scan of the svn log of all the changes done in 10.8 and nothing jumped out at me. There have a been a few fixes that changed sychronization a little to fix bugs and that could have changed timing of the runs, I guess. So far I have just been looking at descriptions not code.
          Hide
          Mike Matrigali added a comment -

          I just reproduced against top of 10.8 branch, SANE, windows XP, single cpu, dual core, ibm laptop t60p. took me 38 minutes.
          running nstest as:
          java \
          -Dderby.stream.error.extendedDiagSeverityLevel=30000 \
          -Dderby.locks.monitor=true \
          -Dderby.locks.deadlockTrace=true \
          org.apache.derbyTesting.system.nstest.NsTest \
          Embedded > nstest.out 2>&1

          Fri Oct 07 15:17:28 PDT 2011 Thread[Thread 36,5,main] (XID = 130487), (SESSIONID = 88), (DATABASE = nstestdb), (DRDAID = null), Failed Statement is:
          insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time,
          t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),ca
          st(X'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
          00000000000000000000000000000000000000000000031' as blob(10K))) with 15 parameters begin parameter #1: 684965125 :end parameter begin parameter #2: CV
          ZXCVZXCV !@#$!@#$ asdfasdf 1 q a :end parameter begin parameter #3: 3713-01-07 :end parameter begin parameter #4: 828.5777992596651 :end parameter be
          gin parameter #5: 0.2472237598957745 :end parameter begin parameter #6: 4.995415321343787E285 :end parameter begin parameter #7: 8.21632369503401E25 :
          end parameter begin parameter #8: -1314135079 :end parameter begin parameter #9: 777072442558566171 :end parameter begin parameter #10: 6.333797131351
          71 :end parameter begin parameter #11: 25.84169 :end parameter begin parameter #12: 32594 :end parameter begin parameter #13: 01:11:04 :end parameter
          begin parameter #14: 2075-01-18 15:28:15.9 :end parameter begin parameter #15: qwerqwer 12341234 ZXCVZXCVZXCV !@#$!@#$ asdfasdf 1 q a z asdf ASDF qwer
          asdfzxcvasdfqwer1234asd# :end parameter ^M
          ERROR 40001: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
          Lock : ROW, SYSCOLUMNS, (5,16)
          Waiting XID :

          {130487, X}

          , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_num
          eric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('0000000000000
          0000000000000000000031' as clob(1K)),cast(X'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
          00000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K)))
          Granted XID :

          {130470, X}

          Lock : ROW, NSTESTTAB, (687,40)
          Waiting XID :

          {130470, S}

          , NSTEST, select max(serialkey) from nstesttab where serialkey > ?
          Granted XID :

          {130455, X}
          Lock : ROW, SYSCOLUMNS, (5,16)
          Waiting XID : {130455, X}

          , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_num
          eric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('0000000000000
          0000000000000000000031' as clob(1K)),cast(X'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
          00000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K)))
          . The selected victim is XID : 130487.^M
          at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:303)^M
          at org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:554)^M
          at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:629)^M
          at org.apache.derby.impl.services.locks.AbstractPool.lockObject(AbstractPool.java:119)^M
          at org.apache.derby.impl.services.locks.ConcurrentPool.lockObject(ConcurrentPool.java:28)^M
          at org.apache.derby.impl.store.raw.xact.RowLocking3.lockRecordForWrite(RowLocking3.java:248)^M
          at org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.lockPositionForWrite(OpenConglomerate.java:587)^M
          at org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(GenericConglomerateController.java:382)^M
          at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getSetAutoincrementValue(DataDictionaryImpl.java:8676)^M
          at org.apache.derby.impl.sql.execute.InsertResultSet.getSetAutoincrementValue(InsertResultSet.java:830)^M
          at org.apache.derby.impl.sql.execute.BaseActivation.getSetAutoincrementValue(BaseActivation.java:666)^M
          at org.apache.derby.exe.accf0fdf6ax0132xe04fxbf22x0000001a6cc01f.e0(Unknown Source)^M
          at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGeneratedClass.java:139)^M
          at org.apache.derby.impl.sql.execute.RowResultSet.getNextRowCore(RowResultSet.java:148)^M
          at org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(NormalizeResultSet.java:185)^M
          at org.apache.derby.impl.sql.execute.DMLWriteResultSet.getNextRowCore(DMLWriteResultSet.java:127)^M
          at org.apache.derby.impl.sql.execute.InsertResultSet.open(InsertResultSet.java:504)^M
          at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:436)^M
          at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:317)^M
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)^M
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686)^M
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(EmbedPreparedStatement.java:308)^M
          at org.apache.derbyTesting.system.nstest.utils.DbUtil.add_one_row(DbUtil.java:201)^M
          at org.apache.derbyTesting.system.nstest.tester.TesterObject.doIUDOperation(TesterObject.java:148)^M
          at org.apache.derbyTesting.system.nstest.tester.Tester2.startTesting(Tester2.java:109)^M
          at org.apache.derbyTesting.system.nstest.NsTest.run(NsTest.java:555)^M

          Show
          Mike Matrigali added a comment - I just reproduced against top of 10.8 branch, SANE, windows XP, single cpu, dual core, ibm laptop t60p. took me 38 minutes. running nstest as: java \ -Dderby.stream.error.extendedDiagSeverityLevel=30000 \ -Dderby.locks.monitor=true \ -Dderby.locks.deadlockTrace=true \ org.apache.derbyTesting.system.nstest.NsTest \ Embedded > nstest.out 2>&1 Fri Oct 07 15:17:28 PDT 2011 Thread [Thread 36,5,main] (XID = 130487), (SESSIONID = 88), (DATABASE = nstestdb), (DRDAID = null), Failed Statement is: insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),ca st(X'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000031' as blob(10K))) with 15 parameters begin parameter #1: 684965125 :end parameter begin parameter #2: CV ZXCVZXCV !@#$!@#$ asdfasdf 1 q a :end parameter begin parameter #3: 3713-01-07 :end parameter begin parameter #4: 828.5777992596651 :end parameter be gin parameter #5: 0.2472237598957745 :end parameter begin parameter #6: 4.995415321343787E285 :end parameter begin parameter #7: 8.21632369503401E25 : end parameter begin parameter #8: -1314135079 :end parameter begin parameter #9: 777072442558566171 :end parameter begin parameter #10: 6.333797131351 71 :end parameter begin parameter #11: 25.84169 :end parameter begin parameter #12: 32594 :end parameter begin parameter #13: 01:11:04 :end parameter begin parameter #14: 2075-01-18 15:28:15.9 :end parameter begin parameter #15: qwerqwer 12341234 ZXCVZXCVZXCV !@#$!@#$ asdfasdf 1 q a z asdf ASDF qwer asdfzxcvasdfqwer1234asd# :end parameter ^M ERROR 40001: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : ROW, SYSCOLUMNS, (5,16) Waiting XID : {130487, X} , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_num eric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('0000000000000 0000000000000000000031' as clob(1K)),cast(X'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K))) Granted XID : {130470, X} Lock : ROW, NSTESTTAB, (687,40) Waiting XID : {130470, S} , NSTEST, select max(serialkey) from nstesttab where serialkey > ? Granted XID : {130455, X} Lock : ROW, SYSCOLUMNS, (5,16) Waiting XID : {130455, X} , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_num eric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('0000000000000 0000000000000000000031' as clob(1K)),cast(X'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K))) . The selected victim is XID : 130487.^M at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:303)^M at org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:554)^M at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:629)^M at org.apache.derby.impl.services.locks.AbstractPool.lockObject(AbstractPool.java:119)^M at org.apache.derby.impl.services.locks.ConcurrentPool.lockObject(ConcurrentPool.java:28)^M at org.apache.derby.impl.store.raw.xact.RowLocking3.lockRecordForWrite(RowLocking3.java:248)^M at org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.lockPositionForWrite(OpenConglomerate.java:587)^M at org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(GenericConglomerateController.java:382)^M at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getSetAutoincrementValue(DataDictionaryImpl.java:8676)^M at org.apache.derby.impl.sql.execute.InsertResultSet.getSetAutoincrementValue(InsertResultSet.java:830)^M at org.apache.derby.impl.sql.execute.BaseActivation.getSetAutoincrementValue(BaseActivation.java:666)^M at org.apache.derby.exe.accf0fdf6ax0132xe04fxbf22x0000001a6cc01f.e0(Unknown Source)^M at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGeneratedClass.java:139)^M at org.apache.derby.impl.sql.execute.RowResultSet.getNextRowCore(RowResultSet.java:148)^M at org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(NormalizeResultSet.java:185)^M at org.apache.derby.impl.sql.execute.DMLWriteResultSet.getNextRowCore(DMLWriteResultSet.java:127)^M at org.apache.derby.impl.sql.execute.InsertResultSet.open(InsertResultSet.java:504)^M at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:436)^M at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:317)^M at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)^M at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686)^M at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(EmbedPreparedStatement.java:308)^M at org.apache.derbyTesting.system.nstest.utils.DbUtil.add_one_row(DbUtil.java:201)^M at org.apache.derbyTesting.system.nstest.tester.TesterObject.doIUDOperation(TesterObject.java:148)^M at org.apache.derbyTesting.system.nstest.tester.Tester2.startTesting(Tester2.java:109)^M at org.apache.derbyTesting.system.nstest.NsTest.run(NsTest.java:555)^M
          Hide
          Myrna van Lunteren added a comment - - edited

          For what it's worth, I kicked off 3 runs yesterday, and so far, there are no deadlocks.

          However, on one machine, with sane jars, I did see these errors:
          ERROR XSLA6: Cannot recover the database.
          ERROR 40XT4: An attempt was made to close a transaction that was still active. The transaction has been aborted.

          The first hint of trouble here was:
          org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED inconsistency in space management during insert: slot = 146 getSlotOffset(slot) = 3206 dataWritten = 19 freeSpace = -15 firstFreeByte = 3149 page = ---------------------------------------------------
          page id: Page(378,Container(0, 1153)) Overflow: false PageVersion: 895 SlotsInUse: 159 DeletedRowCount: 93 PageStatus: 1 NextId: 317 firstFreeByte: 3149 freeSpace: -15 totalSpace: 4028 spareSpace: 0% minimumRecordSize : 1 PageSize: 4096

          I logged DERBY-5450 for this and attached the derby.log.

          Show
          Myrna van Lunteren added a comment - - edited For what it's worth, I kicked off 3 runs yesterday, and so far, there are no deadlocks. However, on one machine, with sane jars, I did see these errors: ERROR XSLA6: Cannot recover the database. ERROR 40XT4: An attempt was made to close a transaction that was still active. The transaction has been aborted. The first hint of trouble here was: org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED inconsistency in space management during insert: slot = 146 getSlotOffset(slot) = 3206 dataWritten = 19 freeSpace = -15 firstFreeByte = 3149 page = --------------------------------------------------- page id: Page(378,Container(0, 1153)) Overflow: false PageVersion: 895 SlotsInUse: 159 DeletedRowCount: 93 PageStatus: 1 NextId: 317 firstFreeByte: 3149 freeSpace: -15 totalSpace: 4028 spareSpace: 0% minimumRecordSize : 1 PageSize: 4096 I logged DERBY-5450 for this and attached the derby.log.
          Hide
          Mike Matrigali added a comment -

          I was trying to binary search the deadlock error in the 10.8 branch, but it seems to not reproduce all the time for me. So far I have
          reproduced once as far back as 1170106. but running for 2 hours did not reproduce again. I have not reproduced later than this but
          that particular issue makes no sense to cause this.
          svn update -r 1170106 deadlock in 34 minutes.
          svn update -r 1170106 no deadlock in 120 minutes.

          I think the only thing that can be deduced is that deadlock is caused by something before 1170106 or it has always been around and
          just unlucky to cause it. I'll try a few more, but if anyone has time and sees a deadlock in 10.8 sometime before this change post it.

          Show
          Mike Matrigali added a comment - I was trying to binary search the deadlock error in the 10.8 branch, but it seems to not reproduce all the time for me. So far I have reproduced once as far back as 1170106. but running for 2 hours did not reproduce again. I have not reproduced later than this but that particular issue makes no sense to cause this. svn update -r 1170106 deadlock in 34 minutes. svn update -r 1170106 no deadlock in 120 minutes. I think the only thing that can be deduced is that deadlock is caused by something before 1170106 or it has always been around and just unlucky to cause it. I'll try a few more, but if anyone has time and sees a deadlock in 10.8 sometime before this change post it.
          Hide
          Knut Anders Hatlen added a comment -

          Not sure if it's related, but DERBY-3980 (fixed in 10.8.1.2) made the deadlock detector able to recognize some deadlock situations that previously would have been reported as timeouts. So it might be that the deadlocks have always been there, but only recently started getting reported as deadlocks.

          Show
          Knut Anders Hatlen added a comment - Not sure if it's related, but DERBY-3980 (fixed in 10.8.1.2) made the deadlock detector able to recognize some deadlock situations that previously would have been reported as timeouts. So it might be that the deadlocks have always been there, but only recently started getting reported as deadlocks.
          Hide
          Rick Hillegas added a comment -

          Here is some information on my machine:

          Processor: 2.4 GHz Intel core 2 duo
          Memory: 4 GB 667 MHz DDR2 SDRAM
          OS: Mac OS X 10.6.8
          JVM: Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3515) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode)

          Show
          Rick Hillegas added a comment - Here is some information on my machine: Processor: 2.4 GHz Intel core 2 duo Memory: 4 GB 667 MHz DDR2 SDRAM OS: Mac OS X 10.6.8 JVM: Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3515) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode)
          Hide
          Rick Hillegas added a comment -

          I ran NsTest 5 times against the 10.8 branch at revision 1170106. Each run lasted at least 10 minutes. I did not see any deadlocks although I did see lock timeouts.

          Show
          Rick Hillegas added a comment - I ran NsTest 5 times against the 10.8 branch at revision 1170106. Each run lasted at least 10 minutes. I did not see any deadlocks although I did see lock timeouts.
          Hide
          Mike Matrigali added a comment -

          saw a deadlock at revision 1136020 after about an hour. But no deadlock involved the system catalogs. All were of the following form:
          ERROR 40001: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
          Lock : ROW, NSTESTTAB, (682,50)
          Waiting XID :

          {138765, S}

          , NSTEST, select max(serialkey) from nstesttab where serialkey > ?
          Granted XID :

          {138763, X}

          Lock : ROW, NSTESTTAB, (682,53)
          Waiting XID :

          {138763, S}

          , NSTEST, select max(serialkey) from nstesttab where serialkey > ?
          Granted XID :

          {138765, X}

          . The selected victim is XID : 138765.^M
          at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:303)^M
          at org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:554)^M
          at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:629)^M
          at org.apache.derby.impl.services.locks.ConcurrentLockSet.zeroDurationLockObject(ConcurrentLockSet.java:871)^M
          at org.apache.derby.impl.services.locks.AbstractPool.zeroDurationlockObject(AbstractPool.java:297)^M
          at org.apache.derby.impl.services.locks.ConcurrentPool.zeroDurationlockObject(ConcurrentPool.java:28)^M
          at org.apache.derby.impl.store.raw.xact.RowLocking2nohold.lockRecordForRead(RowLocking2nohold.java:89)^M
          at org.apache.derby.impl.store.access.heap.HeapController.lockRow(HeapController.java:520)^M
          at org.apache.derby.impl.store.access.heap.HeapController.lockRow(HeapController.java:638)^M
          at org.apache.derby.impl.store.access.btree.index.B2IRowLocking3.lockRowOnPage(B2IRowLocking3.java:309)^M
          at org.apache.derby.impl.store.access.btree.index.B2IRowLocking3._lockScanRow(B2IRowLocking3.java:599)^M
          at org.apache.derby.impl.store.access.btree.index.B2IRowLockingRR.lockScanRow(B2IRowLockingRR.java:105)^M
          at org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(BTreeForwardScan.java:305)^M
          at org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(BTreeScan.java:1681)^M
          at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.reloadArray(BulkTableScanResultSet.java:353)^M
          at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.getNextRowCore(BulkTableScanResultSet.java:294)^M
          at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(ProjectRestrictResultSet.java:261)^M
          at org.apache.derby.impl.sql.execute.ScalarAggregateResultSet.getRowFromResultSet(ScalarAggregateResultSet.java:355)^M
          at org.apache.derby.impl.sql.execute.ScalarAggregateResultSet.getNextRowCore(ScalarAggregateResultSet.java:179)^M
          at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(ProjectRestrictResultSet.java:261)^M
          at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(BasicNoPutResultSetImpl.java:478)^M
          at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.java:430)^M
          at org.apache.derby.impl.jdbc.EmbedResultSet.next(EmbedResultSet.java:374)^M
          at org.apache.derbyTesting.system.nstest.utils.DbUtil.pick_one(DbUtil.java:527)^M
          at org.apache.derbyTesting.system.nstest.utils.DbUtil.delete_one_row(DbUtil.java:440)^M
          at org.apache.derbyTesting.system.nstest.tester.TesterObject.doIUDOperation(TesterObject.java:175)^M
          at org.apache.derbyTesting.system.nstest.tester.Tester2.startTesting(Tester2.java:109)^M
          at org.apache.derbyTesting.system.nstest.NsTest.run(NsTest.java:555)^M
          Sun Oct 09 21:04:51 PDT 2011 Thread[Thread 2,5,main] (XID = 138763), (SESSIONID = 20), (DATABASE = nstestdb), (DRDAID = null), Clean
          up action starting^M
          Sun Oct 09 21:04:51 PDT 2011 Thread[Thread 2,5,main] (XID = 138763), (SESSIONID = 20), (DATABASE = nstestdb), (DRDAID = null), Faile
          d Statement is: select max(serialkey) from nstesttab where serialkey > ? with 1 parameters begin parameter #1: 6002 :end parameter ^
          M

          Show
          Mike Matrigali added a comment - saw a deadlock at revision 1136020 after about an hour. But no deadlock involved the system catalogs. All were of the following form: ERROR 40001: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : ROW, NSTESTTAB, (682,50) Waiting XID : {138765, S} , NSTEST, select max(serialkey) from nstesttab where serialkey > ? Granted XID : {138763, X} Lock : ROW, NSTESTTAB, (682,53) Waiting XID : {138763, S} , NSTEST, select max(serialkey) from nstesttab where serialkey > ? Granted XID : {138765, X} . The selected victim is XID : 138765.^M at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:303)^M at org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:554)^M at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:629)^M at org.apache.derby.impl.services.locks.ConcurrentLockSet.zeroDurationLockObject(ConcurrentLockSet.java:871)^M at org.apache.derby.impl.services.locks.AbstractPool.zeroDurationlockObject(AbstractPool.java:297)^M at org.apache.derby.impl.services.locks.ConcurrentPool.zeroDurationlockObject(ConcurrentPool.java:28)^M at org.apache.derby.impl.store.raw.xact.RowLocking2nohold.lockRecordForRead(RowLocking2nohold.java:89)^M at org.apache.derby.impl.store.access.heap.HeapController.lockRow(HeapController.java:520)^M at org.apache.derby.impl.store.access.heap.HeapController.lockRow(HeapController.java:638)^M at org.apache.derby.impl.store.access.btree.index.B2IRowLocking3.lockRowOnPage(B2IRowLocking3.java:309)^M at org.apache.derby.impl.store.access.btree.index.B2IRowLocking3._lockScanRow(B2IRowLocking3.java:599)^M at org.apache.derby.impl.store.access.btree.index.B2IRowLockingRR.lockScanRow(B2IRowLockingRR.java:105)^M at org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(BTreeForwardScan.java:305)^M at org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(BTreeScan.java:1681)^M at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.reloadArray(BulkTableScanResultSet.java:353)^M at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.getNextRowCore(BulkTableScanResultSet.java:294)^M at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(ProjectRestrictResultSet.java:261)^M at org.apache.derby.impl.sql.execute.ScalarAggregateResultSet.getRowFromResultSet(ScalarAggregateResultSet.java:355)^M at org.apache.derby.impl.sql.execute.ScalarAggregateResultSet.getNextRowCore(ScalarAggregateResultSet.java:179)^M at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(ProjectRestrictResultSet.java:261)^M at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(BasicNoPutResultSetImpl.java:478)^M at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.java:430)^M at org.apache.derby.impl.jdbc.EmbedResultSet.next(EmbedResultSet.java:374)^M at org.apache.derbyTesting.system.nstest.utils.DbUtil.pick_one(DbUtil.java:527)^M at org.apache.derbyTesting.system.nstest.utils.DbUtil.delete_one_row(DbUtil.java:440)^M at org.apache.derbyTesting.system.nstest.tester.TesterObject.doIUDOperation(TesterObject.java:175)^M at org.apache.derbyTesting.system.nstest.tester.Tester2.startTesting(Tester2.java:109)^M at org.apache.derbyTesting.system.nstest.NsTest.run(NsTest.java:555)^M Sun Oct 09 21:04:51 PDT 2011 Thread [Thread 2,5,main] (XID = 138763), (SESSIONID = 20), (DATABASE = nstestdb), (DRDAID = null), Clean up action starting^M Sun Oct 09 21:04:51 PDT 2011 Thread [Thread 2,5,main] (XID = 138763), (SESSIONID = 20), (DATABASE = nstestdb), (DRDAID = null), Faile d Statement is: select max(serialkey) from nstesttab where serialkey > ? with 1 parameters begin parameter #1: 6002 :end parameter ^ M
          Hide
          Rick Hillegas added a comment -

          I seem to have lost my ability to reproduce the deadlocks. I have run NsTest 5 times against 10.8.2.1 insane jars without being able to pop the problem (each attempt lasted at least 10 minutes).

          Show
          Rick Hillegas added a comment - I seem to have lost my ability to reproduce the deadlocks. I have run NsTest 5 times against 10.8.2.1 insane jars without being able to pop the problem (each attempt lasted at least 10 minutes).
          Hide
          Rick Hillegas added a comment -

          I was able to pop a deadlock (but not the identity-generator deadlock) running NsTest against the 10.8.2.1 sane jars. I had to run the test 5 times (10 minutes per run) to pop this error. I don't believe that my experiments on 2011-10-07 demonstrate that the deadlocks are NOT present in 10.8.1.2.

          Show
          Rick Hillegas added a comment - I was able to pop a deadlock (but not the identity-generator deadlock) running NsTest against the 10.8.2.1 sane jars. I had to run the test 5 times (10 minutes per run) to pop this error. I don't believe that my experiments on 2011-10-07 demonstrate that the deadlocks are NOT present in 10.8.1.2.
          Hide
          Myrna van Lunteren added a comment -

          I too did see the select max(serialkey) deadlocks on one system - but only with insane jars (on the machine where I usually run insane jars/nstest Embedded during release QA cycles).

          It appears to me that these deadlocks are different from what's described in DERBY-5430.
          As such, I think we should log a separate issue for this, and move the discussion there.

          I think this issue can now be closed, as the backport was completed.

          Show
          Myrna van Lunteren added a comment - I too did see the select max(serialkey) deadlocks on one system - but only with insane jars (on the machine where I usually run insane jars/nstest Embedded during release QA cycles). It appears to me that these deadlocks are different from what's described in DERBY-5430 . As such, I think we should log a separate issue for this, and move the discussion there. I think this issue can now be closed, as the backport was completed.
          Hide
          Rick Hillegas added a comment -

          I agree that there are two different deadlocks:

          1) A deadlock involving locks on a user table and seen with the "select max(serialkey)" query.

          2) A deadlock involving locks on a SYSCOLUMNS row and seen with the "insert into nstesttab" statement.

          (1) may be a bug in NsTest and I agree that it deserves its own JIRA.. Note that I have seen both deadlocks on the 10.8 branch after backing out the DERBY-4437 work. (2) is still a problem in the 10.8 branch. I have no problem with closing this issue as long as DERBY-5430 remains open. Thanks.

          Show
          Rick Hillegas added a comment - I agree that there are two different deadlocks: 1) A deadlock involving locks on a user table and seen with the "select max(serialkey)" query. 2) A deadlock involving locks on a SYSCOLUMNS row and seen with the "insert into nstesttab" statement. (1) may be a bug in NsTest and I agree that it deserves its own JIRA.. Note that I have seen both deadlocks on the 10.8 branch after backing out the DERBY-4437 work. (2) is still a problem in the 10.8 branch. I have no problem with closing this issue as long as DERBY-5430 remains open. Thanks.

            People

            • Assignee:
              Unassigned
              Reporter:
              Rick Hillegas
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development