Derby
  1. Derby
  2. DERBY-5430

Embedded nstest shows ERROR 40001: A lock could not be obtained due to a deadlock, in nstest

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 10.8.2.2
    • Fix Version/s: None
    • Component/s: Test
    • Labels:
      None
    • Environment:
      Windows XP, Suse Linux, NsTest Embedded, ibm 1.6 SR9 FP1.
    • Urgency:
      Urgent
    • Bug behavior facts:
      Regression Test Failure

      Description

      When running the nstest system test with Embedded with the 10.8.2.1 release candidate, or a build sync-ed to the top of the 10.8 tree (at level: version: 10.8.2.2 - (1174879M) - the M is to add checks (only do if not null) to number of PreparedStatement.close() calls), I now see a number of deadlock errors if I capture the output to a file.

      The test run with network server did not show this problem.
      10.7.1.1 and 10.8.1.2 did not show this problem.

      There's nothing in derby.log, but this is an example of the error to the console as captured in a file:

      Thread 37 is now running
      ==========> Tester2Thread 37 THREAD starting <======
      Tester2Thread 37 is getting a connection to the database...
      -->Thread Tester2Thread 37 starting with url jdbc:derby:nstestdb;create=true;boo
      tPassword=12345678 <--
      Connection number: 44
      Tester2Thread 37 dbutil.pick_one() -> Obtained row from the table 35979
      Tester2Thread 37 attempting to delete a row with serialkey = 35979
      Tester2Thread 37 deleted row with serialkey 35979 *** SUCCESS ***
      Tester2Thread 37 dbutil.pick_one() -> Obtained row from the table 35978
      java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a
      deadlock, cycle of locks and waiters is:
      Lock : ROW, SYSCOLUMNS, (5,16)
      Waiting XID :

      {1174407, S} , NSTEST, insert into nstesttab (id, t_char, t_dat
      e, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_larg
      e, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?,
      ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000
      031' as clob(1K)),cast(X'0000000000000000000000000000000000000000000000000000000
      00000000000000000000000000000000000000000000000000000000000000000000000000000000
      000000000000000000000000000000000000000000000000000000031' as blob(10K)))
      Granted XID : {61356, S} , {75588, S} , {147655, S} , {204335, S} , {460963, S } , {511711, S} , {565264, S} , {727656, S} , {791364, S} , {1133117, S}
      Lock : ROW, SYSCOLUMNS, (5,16)
      Waiting XID : {1133117, X} , NSTEST, insert into nstesttab (id, t_char, t_dat
      e, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_larg
      e, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?,
      ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000
      031' as clob(1K)),cast(X'0000000000000000000000000000000000000000000000000000000
      00000000000000000000000000000000000000000000000000000000000000000000000000000000
      000000000000000000000000000000000000000000000000000000031' as blob(10K)))
      Lock : ROW, SYSCOLUMNS, (5,16)
      Waiting XID : {147655, X} , NSTEST, insert into nstesttab (id, t_char, t_date
      , t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large
      , t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?,
      ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('000000000000000000000000000000000
      31' as clob(1K)),cast(X'00000000000000000000000000000000000000000000000000000000
      00000000000000000000000000000000000000000000000000000000000000000000000000000000
      00000000000000000000000000000000000000000000000000000031' as blob(10K)))
      Lock : ROW, SYSCOLUMNS, (5,16)
      Waiting XID : {61356, X} , NSTEST, insert into nstesttab (id, t_char, t_date,
      t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large,
      t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?
      , ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('0000000000000000000000000000000003
      1' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000
      00000000000000000000000000000000000000000000000000000000000000000000000000000000
      0000000000000000000000000000000000000000000000000000031' as blob(10K)))
      . The selected victim is XID : 1174407.
      at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unkn
      own Source)
      at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source
      )
      at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException
      (Unknown Source)
      at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Un
      known Source)
      at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown So
      urce)
      at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown So
      urce)
      at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown So
      urce)
      at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Un
      known Source)
      at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unkno
      wn Source)
      at org.apache.derbyTesting.system.nstest.utils.DbUtil.add_one_row(DbUtil
      .java:201)
      at org.apache.derbyTesting.system.nstest.tester.TesterObject.doIUDOperat
      ion(TesterObject.java:148)
      at org.apache.derbyTesting.system.nstest.tester.Tester1.startTesting(Tes
      ter1.java:118)
      at org.apache.derbyTesting.system.nstest.NsTest.run(NsTest.java:551)
      Caused by: java.sql.SQLException: A lock could not be obtained due to a deadlock
      , cycle of locks and waiters is:
      Lock : ROW, SYSCOLUMNS, (5,16)
      Waiting XID : {1174407, S}

      , NSTEST, insert into nstesttab (id, t_char, t_dat
      e, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_larg
      e, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?,
      ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000
      031' as clob(1K)),cast(X'0000000000000000000000000000000000000000000000000000000
      00000000000000000000000000000000000000000000000000000000000000000000000000000000
      000000000000000000000000000000000000000000000000000000031' as blob(10K)))
      Granted XID :

      {61356, S}

      ,

      {75588, S}

      ,

      {147655, S}

      ,

      {204335, S}

      ,

      {460963, S }

      ,

      {511711, S}

      ,

      {565264, S}

      ,

      {727656, S}

      ,

      {791364, S}

      ,

      {1133117, S}

      Lock : ROW, SYSCOLUMNS, (5,16)
      Waiting XID :

      {1133117, X}

      , NSTEST, insert into nstesttab (id, t_char, t_dat
      e, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_larg
      e, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?,
      ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000
      031' as clob(1K)),cast(X'0000000000000000000000000000000000000000000000000000000
      00000000000000000000000000000000000000000000000000000000000000000000000000000000
      000000000000000000000000000000000000000000000000000000031' as blob(10K)))
      Lock : ROW, SYSCOLUMNS, (5,16)
      Waiting XID :

      {147655, X}

      , NSTEST, insert into nstesttab (id, t_char, t_date
      , t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large
      , t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?,
      ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('000000000000000000000000000000000
      31' as clob(1K)),cast(X'00000000000000000000000000000000000000000000000000000000
      00000000000000000000000000000000000000000000000000000000000000000000000000000000
      00000000000000000000000000000000000000000000000000000031' as blob(10K)))
      Lock : ROW, SYSCOLUMNS, (5,16)
      Waiting XID :

      {61356, X}

      , NSTEST, insert into nstesttab (id, t_char, t_date,
      t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large,
      t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?
      , ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('0000000000000000000000000000000003
      1' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000
      00000000000000000000000000000000000000000000000000000000000000000000000000000000
      0000000000000000000000000000000000000000000000000000031' as blob(10K)))
      . The selected victim is XID : 1174407.
      at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknow
      n Source)
      at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransport
      AcrossDRDA(Unknown Source)
      ... 13 more

        Issue Links

          Activity

          Hide
          Myrna van Lunteren added a comment -

          Marking won't fix as part of 10.10 triage.

          Show
          Myrna van Lunteren added a comment - Marking won't fix as part of 10.10 triage.
          Hide
          Myrna van Lunteren added a comment -

          I have run nstest in a small configuration a number of times after the fix for DERBY-4437 was backed out of trunk as per DERBY-5687, and have seen the deadlock on SYSCOLUMNS once. Before, it would pop up in about half of the runs. So it seems that this is definitely improved.

          I do see a lot more 40XL1 errors (A lock could not be obtained within the time requested), but that seems a less serious error.

          I'm leaving this issue open, because it is still occurring. We can decide to mark it as won't fix at some point in the future.

          Show
          Myrna van Lunteren added a comment - I have run nstest in a small configuration a number of times after the fix for DERBY-4437 was backed out of trunk as per DERBY-5687 , and have seen the deadlock on SYSCOLUMNS once. Before, it would pop up in about half of the runs. So it seems that this is definitely improved. I do see a lot more 40XL1 errors (A lock could not be obtained within the time requested), but that seems a less serious error. I'm leaving this issue open, because it is still occurring. We can decide to mark it as won't fix at some point in the future.
          Hide
          Myrna van Lunteren added a comment -

          This issue remains here to describe the deadlock on SYSCOLUMNS.
          Another issue (DERBY-5454) was created to highlight the deadlock on select max(serial_key) in NSTESTTAB. It seems that issue might possibly be a design issue with the test.

          As I understand it though, this deadlock is more serious because there isn't something a user can change in the design of a test to change it. Perhaps I've misunderstood. But for now, I am assuming this issue is related to DERBY-5661 (assert failure indicating the transaction is not pristine), and DERBY-5428 (assert failure indicating trouble with a live cacheable).

          I have added various run output details to DERBY-5661 that also show the deadlock on SYSCOLUMNS.

          Show
          Myrna van Lunteren added a comment - This issue remains here to describe the deadlock on SYSCOLUMNS. Another issue ( DERBY-5454 ) was created to highlight the deadlock on select max(serial_key) in NSTESTTAB. It seems that issue might possibly be a design issue with the test. As I understand it though, this deadlock is more serious because there isn't something a user can change in the design of a test to change it. Perhaps I've misunderstood. But for now, I am assuming this issue is related to DERBY-5661 (assert failure indicating the transaction is not pristine), and DERBY-5428 (assert failure indicating trouble with a live cacheable). I have added various run output details to DERBY-5661 that also show the deadlock on SYSCOLUMNS.
          Hide
          Knut Anders Hatlen added a comment -

          I see many similar deadlocks on 10.7.1.1, but so far none identical to the ones reported in this issue. The deadlocks I see on 10.7 are like a mix of DERBY-5454 and this issue, as they contain some transactions waiting for a lock on row (5,16) in SYSCOLUMNS and some transactions waiting for locks in NSTESTTAB. Here's a simple example which contains one of each:

          java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
          Lock : ROW, NSTESTTAB, (592,53)
          Waiting XID :

          {94984, S}

          , NSTEST, select max(serialkey) from nstesttab where serialkey > ?
          Granted XID :

          {94979, X}
          Lock : ROW, SYSCOLUMNS, (5,16)
          Waiting XID : {94979, X}

          , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K)))
          Granted XID :

          {94984, X}


          . The selected victim is XID : 94984.

          Show
          Knut Anders Hatlen added a comment - I see many similar deadlocks on 10.7.1.1, but so far none identical to the ones reported in this issue. The deadlocks I see on 10.7 are like a mix of DERBY-5454 and this issue, as they contain some transactions waiting for a lock on row (5,16) in SYSCOLUMNS and some transactions waiting for locks in NSTESTTAB. Here's a simple example which contains one of each: java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : ROW, NSTESTTAB, (592,53) Waiting XID : {94984, S} , NSTEST, select max(serialkey) from nstesttab where serialkey > ? Granted XID : {94979, X} Lock : ROW, SYSCOLUMNS, (5,16) Waiting XID : {94979, X} , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K))) Granted XID : {94984, X} . The selected victim is XID : 94984.
          Hide
          Knut Anders Hatlen added a comment -

          And I also see this deadlock with the 10.8.2.2 release candidate. So it seems like this issue is neither a new bug in 10.8.2 nor fixed by backing out DERBY-4437.

          Show
          Knut Anders Hatlen added a comment - And I also see this deadlock with the 10.8.2.2 release candidate. So it seems like this issue is neither a new bug in 10.8.2 nor fixed by backing out DERBY-4437 .
          Hide
          Knut Anders Hatlen added a comment -

          I started a run using derby.jar from 10.8.1.2 and derbyTesting.jar from 10.8.2.2, and saw this exception, which looks similar to the exception in the bug description:

          java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
          Lock : ROW, SYSCOLUMNS, (5,16)
          Waiting XID :

          {100348, X}

          , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K)))
          Granted XID :

          {100279, X}


          Lock : ROW, NSTESTTAB, (610,7)
          Waiting XID :

          {100279, S}

          , NSTEST, select max(serialkey) from nstesttab where serialkey > ?
          Granted XID :

          {100307, X}
          Lock : ROW, SYSCOLUMNS, (5,16)
          Waiting XID : {100307, X}

          , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K)))
          Lock : ROW, SYSCOLUMNS, (5,16)
          Waiting XID :

          {100352, X}

          , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K)))
          Lock : ROW, SYSCOLUMNS, (5,16)
          Waiting XID :

          {100314, X}

          , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K)))
          . The selected victim is XID : 100348.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
          at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
          at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
          at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source)
          at org.apache.derbyTesting.system.nstest.utils.DbUtil.add_one_row(DbUtil.java:201)
          at org.apache.derbyTesting.system.nstest.tester.TesterObject.doIUDOperation(TesterObject.java:148)
          at org.apache.derbyTesting.system.nstest.tester.Tester2.startTesting(Tester2.java:109)
          at org.apache.derbyTesting.system.nstest.NsTest.run(NsTest.java:555)

          Show
          Knut Anders Hatlen added a comment - I started a run using derby.jar from 10.8.1.2 and derbyTesting.jar from 10.8.2.2, and saw this exception, which looks similar to the exception in the bug description: java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : ROW, SYSCOLUMNS, (5,16) Waiting XID : {100348, X} , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K))) Granted XID : {100279, X} Lock : ROW, NSTESTTAB, (610,7) Waiting XID : {100279, S} , NSTEST, select max(serialkey) from nstesttab where serialkey > ? Granted XID : {100307, X} Lock : ROW, SYSCOLUMNS, (5,16) Waiting XID : {100307, X} , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K))) Lock : ROW, SYSCOLUMNS, (5,16) Waiting XID : {100352, X} , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K))) Lock : ROW, SYSCOLUMNS, (5,16) Waiting XID : {100314, X} , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K))) . The selected victim is XID : 100348. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.derbyTesting.system.nstest.utils.DbUtil.add_one_row(DbUtil.java:201) at org.apache.derbyTesting.system.nstest.tester.TesterObject.doIUDOperation(TesterObject.java:148) at org.apache.derbyTesting.system.nstest.tester.Tester2.startTesting(Tester2.java:109) at org.apache.derbyTesting.system.nstest.NsTest.run(NsTest.java:555)
          Hide
          Kathey Marsden added a comment -

          I think it would be good to get clarification from Myrna and a posting of the exact deadlock she encountered to determine if it is the preexisting one or DERBY-5430 but I know she is out until Monday.

          Show
          Kathey Marsden added a comment - I think it would be good to get clarification from Myrna and a posting of the exact deadlock she encountered to determine if it is the preexisting one or DERBY-5430 but I know she is out until Monday.
          Hide
          Rick Hillegas added a comment -

          Hi Kathey,

          If I correctly understand Myrna's first comment today, she saw this deadlock with the current release candidate (10.8.2.2). Thanks.

          Show
          Rick Hillegas added a comment - Hi Kathey, If I correctly understand Myrna's first comment today, she saw this deadlock with the current release candidate (10.8.2.2). Thanks.
          Hide
          Kathey Marsden added a comment -

          I think I lost track. I had thought the deadlock reported in this issue (as well as the NPE DERBY-5422) had been resolved in 10.8 by backing out DERBY-4437 and then a different pre-existing deadlock showed in the test.

          Show
          Kathey Marsden added a comment - I think I lost track. I had thought the deadlock reported in this issue (as well as the NPE DERBY-5422 ) had been resolved in 10.8 by backing out DERBY-4437 and then a different pre-existing deadlock showed in the test.
          Hide
          Rick Hillegas added a comment -

          This issue is what caused us to back out the concurrency improvements introduced by DERBY-4437. I have not seen any evidence that this problem appears in 10.8.1 or earlier. Note that there is another deadlock in this test which does appear in 10.8.1 but that is not this deadlock. Is this an accurate summary of the evidence so far? Thanks.

          Show
          Rick Hillegas added a comment - This issue is what caused us to back out the concurrency improvements introduced by DERBY-4437 . I have not seen any evidence that this problem appears in 10.8.1 or earlier. Note that there is another deadlock in this test which does appear in 10.8.1 but that is not this deadlock. Is this an accurate summary of the evidence so far? Thanks.
          Hide
          Myrna van Lunteren added a comment -

          Changing the urgency. The community has analyzed this issue and non of the fixes backported since the release of 10.8.1 seem relevant, thus it is suspected this issue is of longer standing and thus not a regression (at least not from 10.8.1 to 10.8.2).

          Show
          Myrna van Lunteren added a comment - Changing the urgency. The community has analyzed this issue and non of the fixes backported since the release of 10.8.1 seem relevant, thus it is suspected this issue is of longer standing and thus not a regression (at least not from 10.8.1 to 10.8.2).
          Hide
          Myrna van Lunteren added a comment -

          I saw this again with 10.8.2.2 (RC3 - 1181258).

          Show
          Myrna van Lunteren added a comment - I saw this again with 10.8.2.2 (RC3 - 1181258).
          Hide
          Kathey Marsden added a comment -

          I am not sure if it helps in this case, but I thought I would mention It might be useful to set the new property derby.stream.error.extendedDiagSeverityLevel to 30000 to get a thread dump on the deadlocks.

          http://db.apache.org/derby/docs/10.8/ref/rrefproperextdiagsevlevel.html

          Show
          Kathey Marsden added a comment - I am not sure if it helps in this case, but I thought I would mention It might be useful to set the new property derby.stream.error.extendedDiagSeverityLevel to 30000 to get a thread dump on the deadlocks. http://db.apache.org/derby/docs/10.8/ref/rrefproperextdiagsevlevel.html
          Hide
          Rick Hillegas added a comment -

          The deadlock trace is very simple in the reproduction on my machine. Two transactions have shared locks on the SYSCOLUMNS row and both are waiting to get exclusive locks on the row:

          Exception when preparing or executing insert prepared stmt
          java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
          Lock : ROW, SYSCOLUMNS, (5,16)
          Waiting XID :

          {58916, X}

          , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K)))
          Granted XID :

          {58906, S}
          Lock : ROW, SYSCOLUMNS, (5,16)
          Waiting XID : {58906, X} , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K)))
          Granted XID : {58906, S}

          ,

          {58916, S}


          . The selected victim is XID : 58916.

          Show
          Rick Hillegas added a comment - The deadlock trace is very simple in the reproduction on my machine. Two transactions have shared locks on the SYSCOLUMNS row and both are waiting to get exclusive locks on the row: Exception when preparing or executing insert prepared stmt java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : ROW, SYSCOLUMNS, (5,16) Waiting XID : {58916, X} , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K))) Granted XID : {58906, S} Lock : ROW, SYSCOLUMNS, (5,16) Waiting XID : {58906, X} , NSTEST, insert into nstesttab (id, t_char, t_date, t_decimal, t_decimal_nn, t_double, t_float, t_int, t_longint, t_numeric_large, t_real, t_smallint, t_time, t_timestamp, t_varchar,t_clob,t_blob) values ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,cast('00000000000000000000000000000000031' as clob(1K)),cast(X'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031' as blob(10K))) Granted XID : {58906, S} , {58916, S} . The selected victim is XID : 58916.
          Hide
          Rick Hillegas added a comment -

          Thanks for that analysis Knut and thanks for the extra info, Myrna. By being patient and running the test longer, I have been able to reproduce the problem.

          Show
          Rick Hillegas added a comment - Thanks for that analysis Knut and thanks for the extra info, Myrna. By being patient and running the test longer, I have been able to reproduce the problem.
          Hide
          Knut Anders Hatlen added a comment -

          Rick, I think your reading of the deadlock trace is correct, and killing tx 1174407 isn't helpful for breaking the cycle. Picking victims that already hold locks involved in the cycle obviously would have resolved the deadlock quicker and with less victims. However, I think the chosen victim technically may be part of the cycle.

          It is blocked because there's one or more transactions ahead of it in the wait queue that try to lock the same row exclusively (cannot tell exactly which transactions are ahead of it, since that information isn't provided by the trace). That transaction is blocked by the transactions that have been granted shared lock on the row. One (or more) of these transactions is waiting for an exclusive lock on the row, and is right behind 1174407 in the wait queue, and is therefore waiting for 1174407. This means 1174407 is indirectly waiting for itself, and we have a cycle.

          The deadlock detection algorithm picks the first tx it finds in the cycle as the victim. It doesn't really try to find the "best" victim. I assume there would be two more deadlock exceptions in the log not so long after this one, killing two of the transactions waiting for X locks.

          An alternative reading of the stack trace (more in line with your reading, and since the order of the wait queue isn't specified, just as reasonable) is that 1174407 attempted to lock the row after the three transactions waiting for an exclusive lock had requested X locks. In that case, none of the transactions are waiting for tx 1174407, and it should not have been picked as a victim.

          Show
          Knut Anders Hatlen added a comment - Rick, I think your reading of the deadlock trace is correct, and killing tx 1174407 isn't helpful for breaking the cycle. Picking victims that already hold locks involved in the cycle obviously would have resolved the deadlock quicker and with less victims. However, I think the chosen victim technically may be part of the cycle. It is blocked because there's one or more transactions ahead of it in the wait queue that try to lock the same row exclusively (cannot tell exactly which transactions are ahead of it, since that information isn't provided by the trace). That transaction is blocked by the transactions that have been granted shared lock on the row. One (or more) of these transactions is waiting for an exclusive lock on the row, and is right behind 1174407 in the wait queue, and is therefore waiting for 1174407. This means 1174407 is indirectly waiting for itself, and we have a cycle. The deadlock detection algorithm picks the first tx it finds in the cycle as the victim. It doesn't really try to find the "best" victim. I assume there would be two more deadlock exceptions in the log not so long after this one, killing two of the transactions waiting for X locks. An alternative reading of the stack trace (more in line with your reading, and since the order of the wait queue isn't specified, just as reasonable) is that 1174407 attempted to lock the row after the three transactions waiting for an exclusive lock had requested X locks. In that case, none of the transactions are waiting for tx 1174407, and it should not have been picked as a victim.
          Hide
          Myrna van Lunteren added a comment -

          Rick, thank you for trying out this process and looking at the error...

          To answer your question about how soon to expect this, it varies significantly...

          One run I started on 9/23 with insane jars, ran into the first deadlock after about 4 minutes.
          In another run (that one was with sane jars) the first 40001 error happened after about 2 days, 6 hours of running...
          (Both of these were with modified DbUtils.java but the 10.8 tree sync-ed up to 1174879.)

          These tests are supposed to be long running; - with 10.8.1.2 it ran for 8 days without getting any deadlocks...

          Show
          Myrna van Lunteren added a comment - Rick, thank you for trying out this process and looking at the error... To answer your question about how soon to expect this, it varies significantly... One run I started on 9/23 with insane jars, ran into the first deadlock after about 4 minutes. In another run (that one was with sane jars) the first 40001 error happened after about 2 days, 6 hours of running... (Both of these were with modified DbUtils.java but the 10.8 tree sync-ed up to 1174879.) These tests are supposed to be long running; - with 10.8.1.2 it ran for 8 days without getting any deadlocks...
          Hide
          Rick Hillegas added a comment -

          Not sure if I am reading the deadlock cycle correctly. Would appreciate advice from someone who is better at reading these traces.

          It looks to me as though many transactions have shared locks on the SYSCOLUMNS row for the identity column. Three of these transactions are also waiting to get exclusive locks on the row:

          61356
          147655
          1133117

          I would expect all of these transactions to be wedged. None of them can get an exclusive lock while the others hold shared locks. None of them is going to release its shared lock until it gets an exclusive lock.

          The victim stumbles on this logjam. The victim is trying to get a shared lock on the row. I don't understand why this transaction was selected as the victim. He doesn't seem to be involved in a cycle and killing him won't break the logjam. My confusion suggests to me that I don't know how to read this deadlock trace.

          Thanks,
          -Rick

          Show
          Rick Hillegas added a comment - Not sure if I am reading the deadlock cycle correctly. Would appreciate advice from someone who is better at reading these traces. It looks to me as though many transactions have shared locks on the SYSCOLUMNS row for the identity column. Three of these transactions are also waiting to get exclusive locks on the row: 61356 147655 1133117 I would expect all of these transactions to be wedged. None of them can get an exclusive lock while the others hold shared locks. None of them is going to release its shared lock until it gets an exclusive lock. The victim stumbles on this logjam. The victim is trying to get a shared lock on the row. I don't understand why this transaction was selected as the victim. He doesn't seem to be involved in a cycle and killing him won't break the logjam. My confusion suggests to me that I don't know how to read this deadlock trace. Thanks, -Rick
          Hide
          Rick Hillegas added a comment -

          Hm, my comment from yesterday does not seem to have been accepted. Re-trying:

          How are you running this test and how soon after starting it up do you see the deadlocks? Using the following command, I ran the test for 3 minutes but I did not see any deadlocks in either the output file or derby.log:

          java org.apache.derbyTesting.system.nstest.NsTest Embedded > z.out

          Thanks,
          -Rick

          Show
          Rick Hillegas added a comment - Hm, my comment from yesterday does not seem to have been accepted. Re-trying: How are you running this test and how soon after starting it up do you see the deadlocks? Using the following command, I ran the test for 3 minutes but I did not see any deadlocks in either the output file or derby.log: java org.apache.derbyTesting.system.nstest.NsTest Embedded > z.out Thanks, -Rick

            People

            • Assignee:
              Unassigned
              Reporter:
              Myrna van Lunteren
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development