Derby
  1. Derby
  2. DERBY-4275

Query executions fail when compressing a table using SYSCS_UTIL.SYSCS_COMPRESS_TABLE

    Details

    • Urgency:
      Normal
    • Issue & fix info:
      High Value Fix, Repro attached
    • Bug behavior facts:
      Crash, Seen in production

      Description

      Query executions (SELECT and/or UPDATE) fail with serious exceptions while the table is being compressed using SYSCS_UTIL.SYSCS_COMPRESS_ TABLE. The compression eventually finishes normally, but the queries keep failing with the same error until the database is rebooted. More information about this can be found on the Derby mailing list at http://www.nabble.com/Issue-with-SYSCS_UTIL.SYSCS_COMPRESS_-TABLE-td23892893.html. The exception stacktrace is below:

      Caused by: java.sql.SQLException: The conglomerate (71,409) requested does not exist.
      at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
      at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
      at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
      at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
      at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
      at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
      at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
      at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source)
      at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeQuery(Unknown Source)
      at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:93)
      ... 25 more
      Caused by: ERROR XSAI2: The conglomerate (71,409) requested does not exist.
      at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
      at org.apache.derby.impl.store.access.btree.index.B2IFactory.readConglomerate(Unknown Source)
      at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(Unknown Source)
      at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(Unknown Source)
      at org.apache.derby.impl.store.access.RAMTransaction.openScan(Unknown Source)
      at org.apache.derby.impl.store.access.BackingStoreHashTableFromScan.<init>(Unknown Source)
      at org.apache.derby.impl.store.access.RAMTransaction.createBackingStoreHashtableFromScan(Unknown Source)
      at org.apache.derby.impl.sql.execute.HashScanResultSet.openCore(Unknown Source)
      at org.apache.derby.impl.sql.execute.JoinResultSet.openRight(Unknown Source)
      at org.apache.derby.impl.sql.execute.JoinResultSet.openCore(Unknown Source)
      at org.apache.derby.impl.sql.execute.JoinResultSet.openCore(Unknown Source)
      at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.openCore(Unknown Source)
      at org.apache.derby.impl.sql.execute.UnionResultSet.getNextRowCore(Unknown Source)
      at org.apache.derby.impl.sql.execute.SortResultSet.getRowFromResultSet(Unknown Source)
      at org.apache.derby.impl.sql.execute.SortResultSet.getNextRowFromRS(Unknown Source)
      at org.apache.derby.impl.sql.execute.SortResultSet.loadSorter(Unknown Source)
      at org.apache.derby.impl.sql.execute.SortResultSet.openCore(Unknown Source)
      at org.apache.derby.impl.sql.execute.SortResultSet.openCore(Unknown Source)
      at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(Unknown Source)
      at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)

      1. CompressDBTest1.java
        9 kB
        Sai Pullabhotla
      2. CompressDBTest2.java
        8 kB
        Sai Pullabhotla
      3. invalidate-after.diff
        1 kB
        Knut Anders Hatlen
      4. d4275-1a.diff
        11 kB
        Knut Anders Hatlen
      5. invalidation-during-compilation.diff
        2 kB
        Knut Anders Hatlen
      6. D4275.java
        2 kB
        Knut Anders Hatlen
      7. npe.diff
        1 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Sai Pullabhotla added a comment -

          The attached files can be used to reproduce this reported bug.

          CompressDBTest1.java - tries to read a table while it is being compressed. The select statement errors out with the following error:
          java.sql.SQLException: Container 2,192 not found.
          Subsequent SELECTs with a brand new connection also fail with a little bit different error:
          The conglomerate (2,192) requested does not exist.

          CompressDBTest2.java - tries to insert data into a table while it is being compressed. The insert errors out eventually with the following error:
          A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
          Lock : TABLE, TEST, Tablelock
          Waiting XID :

          {236439, IX}

          , APP, insert into test values(?, ?, ?, ?, ?)
          Granted XID :

          {234342, X}
          Lock : ROW, SYSCONGLOMERATES, (5,14)
          Waiting XID : {234342, X}

          , APP, alter table "APP"."TEST" compress sequential
          Granted XID :

          {234342, S}

          ,

          {236439, S}

          . The selected victim is XID : 236439.

          What the test classes do:

          Both classes first create a database named test under the current working directory if a directory named test does not already exist. Then a table named test is created and populated with 100,000 random records. Two threads are started then with the first one repeatedly compressing the test table and the second one repeatedly executing a DML statement. If the DML statement errors out, the compression thread will be stopped. Finally the database is shutdown. The standard output of the program goes to stdout.log and standard error goes to stderr.log in the current working directory.

          Show
          Sai Pullabhotla added a comment - The attached files can be used to reproduce this reported bug. CompressDBTest1.java - tries to read a table while it is being compressed. The select statement errors out with the following error: java.sql.SQLException: Container 2,192 not found. Subsequent SELECTs with a brand new connection also fail with a little bit different error: The conglomerate (2,192) requested does not exist. CompressDBTest2.java - tries to insert data into a table while it is being compressed. The insert errors out eventually with the following error: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : TABLE, TEST, Tablelock Waiting XID : {236439, IX} , APP, insert into test values(?, ?, ?, ?, ?) Granted XID : {234342, X} Lock : ROW, SYSCONGLOMERATES, (5,14) Waiting XID : {234342, X} , APP, alter table "APP"."TEST" compress sequential Granted XID : {234342, S} , {236439, S} . The selected victim is XID : 236439. What the test classes do: Both classes first create a database named test under the current working directory if a directory named test does not already exist. Then a table named test is created and populated with 100,000 random records. Two threads are started then with the first one repeatedly compressing the test table and the second one repeatedly executing a DML statement. If the DML statement errors out, the compression thread will be stopped. Finally the database is shutdown. The standard output of the program goes to stdout.log and standard error goes to stderr.log in the current working directory.
          Hide
          Dag H. Wanvik added a comment -

          Thanks for the repro, Sai! I was able to reproduce with Derby 10.4.1.3.

          Show
          Dag H. Wanvik added a comment - Thanks for the repro, Sai! I was able to reproduce with Derby 10.4.1.3.
          Hide
          Dag H. Wanvik added a comment -

          Triaged for 10.5.2, checking "repro attached".

          Show
          Dag H. Wanvik added a comment - Triaged for 10.5.2, checking "repro attached".
          Hide
          Knut Anders Hatlen added a comment -

          Some of the discussion on DERBY-637 may be relevant for this issue as well.

          Show
          Knut Anders Hatlen added a comment - Some of the discussion on DERBY-637 may be relevant for this issue as well.
          Hide
          Brett Wooldridge added a comment -

          Throwing my vote in for this bug. We were planning on enabling online compression, statistics regeneration, and backup in an upcoming release of our software.

          Show
          Brett Wooldridge added a comment - Throwing my vote in for this bug. We were planning on enabling online compression, statistics regeneration, and backup in an upcoming release of our software.
          Hide
          Knut Anders Hatlen added a comment -

          The problem shown by CompressDBTest2.java looks like DERBY-3683.

          The "conglomerate does not exist" errors in CompressDBTest1.java seem to happen because AlterTableConstantAction.compressTable() invalidates the prepared statements before it compresses the table. Executing statements may then try to recompile before the table has been compressed, and they continue using the old conglomerate since they're not invalidated again.

          I tried to delay the invalidation until the new compressed conglomerates had been created and the system tables were updated (see attached patch), and that made the CompressDBTest1 repro complete with no failures.

          There is a comment saying that the early invalidation was intentional to work around a bug. Looking at an earlier version of the comment, it referred to a bug 3653 where some statements didn't recompile if the invalidation happened later. I don't know what kind of statements had that problem, though. I'll run the regression tests to see if they show anything. (Truncate table has a similar comment and would probably need a similar fix.)

          There's always the possibility to invalidate statements both before and after compressing the table, but it really should be enough to do it once.

          Show
          Knut Anders Hatlen added a comment - The problem shown by CompressDBTest2.java looks like DERBY-3683 . The "conglomerate does not exist" errors in CompressDBTest1.java seem to happen because AlterTableConstantAction.compressTable() invalidates the prepared statements before it compresses the table. Executing statements may then try to recompile before the table has been compressed, and they continue using the old conglomerate since they're not invalidated again. I tried to delay the invalidation until the new compressed conglomerates had been created and the system tables were updated (see attached patch), and that made the CompressDBTest1 repro complete with no failures. There is a comment saying that the early invalidation was intentional to work around a bug. Looking at an earlier version of the comment, it referred to a bug 3653 where some statements didn't recompile if the invalidation happened later. I don't know what kind of statements had that problem, though. I'll run the regression tests to see if they show anything. (Truncate table has a similar comment and would probably need a similar fix.) There's always the possibility to invalidate statements both before and after compressing the table, but it really should be enough to do it once.
          Hide
          Knut Anders Hatlen added a comment -

          Running the regression test suites didn't reveal any problems with the patch. I'll see if I can construct a case that runs into problem mentioned in the comment:

          // invalidate any prepared statements that
          // depended on this table (including this one)
          // bug 3653 has threads that start up and block on our lock, but do
          // not see they have to recompile their plan. We now invalidate earlier
          // however they still might recompile using the old conglomerate id before we
          // commit our DD changes.

          The scenario described in the last sentence of that comment, is exactly what's happening when we see the bug.

          I'm not sure why other threads would fail to see that they need to recompile if the invalidation happens later. They may fail because the conglomerate doesn't exist anymore, but the result set classes have code that checks if the statement is still valid if an error happens, and recompiles and reexecutes if it's not valid, so that case is supposed to be handled. But maybe that code was added after bug 3653 had been fixed, I don't know...

          Show
          Knut Anders Hatlen added a comment - Running the regression test suites didn't reveal any problems with the patch. I'll see if I can construct a case that runs into problem mentioned in the comment: // invalidate any prepared statements that // depended on this table (including this one) // bug 3653 has threads that start up and block on our lock, but do // not see they have to recompile their plan. We now invalidate earlier // however they still might recompile using the old conglomerate id before we // commit our DD changes. The scenario described in the last sentence of that comment, is exactly what's happening when we see the bug. I'm not sure why other threads would fail to see that they need to recompile if the invalidation happens later. They may fail because the conglomerate doesn't exist anymore, but the result set classes have code that checks if the statement is still valid if an error happens, and recompiles and reexecutes if it's not valid, so that case is supposed to be handled. But maybe that code was added after bug 3653 had been fixed, I don't know...
          Hide
          Knut Anders Hatlen added a comment -

          During SYSCS_COMPRESS_TABLE we invalidate dependent statements three times, and all of them happen before any changes have been made to the data dictionary. With the patch, this is changed to two invalidations before the DD changes and one after the DD is updated.

          The first two invalidations are done by the general alter table machinery. The compress table implementation was moved into alter table in 10.5 (DERBY-1062). It may be that the earlier invalidation was needed before that change, but now that we have the alter table machinery doing early invalidation in any case, the bug mentioned in the comments (bug 3653) is probably handled even if we move the compress-specific invalidation after the data dictionary changes.

          Show
          Knut Anders Hatlen added a comment - During SYSCS_COMPRESS_TABLE we invalidate dependent statements three times, and all of them happen before any changes have been made to the data dictionary. With the patch, this is changed to two invalidations before the DD changes and one after the DD is updated. The first two invalidations are done by the general alter table machinery. The compress table implementation was moved into alter table in 10.5 ( DERBY-1062 ). It may be that the earlier invalidation was needed before that change, but now that we have the alter table machinery doing early invalidation in any case, the bug mentioned in the comments (bug 3653) is probably handled even if we move the compress-specific invalidation after the data dictionary changes.
          Hide
          Knut Anders Hatlen added a comment -

          Attached is a patch which adds JUnit test cases that reproduce this bug with compress table and truncate table. The patch also moves the third invalidation on compress/truncate to a point where the new conglomerate information has been written to the system tables, so that the statements get recompiled using the new conglomerates.

          All the regression tests ran cleanly with the patch.

          Show
          Knut Anders Hatlen added a comment - Attached is a patch which adds JUnit test cases that reproduce this bug with compress table and truncate table. The patch also moves the third invalidation on compress/truncate to a point where the new conglomerate information has been written to the system tables, so that the statements get recompiled using the new conglomerates. All the regression tests ran cleanly with the patch.
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 1142583.

          I also plan to backport the fix to the 10.8 branch.

          Show
          Knut Anders Hatlen added a comment - Committed revision 1142583. I also plan to backport the fix to the 10.8 branch.
          Hide
          Knut Anders Hatlen added a comment -

          Merged to 10.8. Committed revision 1143356.

          Show
          Knut Anders Hatlen added a comment - Merged to 10.8. Committed revision 1143356.
          Hide
          Kristian Waagan added a comment -

          I'm still seeing this on a Linux machine:
          Fedora 15
          Linux work2 2.6.38.8-32.fc15.x86_64 #1 SMP Mon Jun 13 19:49:05 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
          Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
          java version "1.6.0_22"
          OpenJDK Runtime Environment (IcedTea6 1.10.2) (fedora-58.1.10.2.fc15-x86_64)
          OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)

          To reproduce I ran two instances of the test in a tight loop in the shell. I've also seen it reproduce when I've used the parallel test runner.

          Last minute update, it also reproduces on Solaris 11:
          java version "1.6.0_24"
          Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
          Java HotSpot(TM) Server VM (build 19.1-b02, mixed mode)

          Time: 5.035
          There was 1 failure:
          1) testConcurrentInvalidation(org.apache.derbyTesting.functionTests.tests.lang.CompressTableTest)junit.framework.AssertionFailedError: Helper thread failed
          at org.apache.derbyTesting.junit.BaseTestCase.fail(BaseTestCase.java:801)
          at org.apache.derbyTesting.functionTests.tests.lang.CompressTableTest.testConcurrentInvalidation(CompressTableTest.java:121)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:112)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          Caused by: java.sql.SQLException: The conglomerate (2,720) requested does not exist.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:98)
          at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:256)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:400)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:348)
          at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2290)
          at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:82)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1334)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeQuery(EmbedPreparedStatement.java:284)
          at org.apache.derbyTesting.functionTests.tests.lang.CompressTableTest$1.run(CompressTableTest.java:92)
          Caused by: java.sql.SQLException: The conglomerate (2,720) requested does not exist.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:122)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:71)
          ... 9 more
          Caused by: ERROR XSAI2: The conglomerate (2,720) requested does not exist.
          at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:278)
          at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.readConglomerate(HeapConglomerateFactory.java:254)
          at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:482)
          at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:394)
          at org.apache.derby.impl.store.access.RAMTransaction.getDynamicCompiledConglomInfo(RAMTransaction.java:692)
          at org.apache.derby.impl.sql.execute.TableScanResultSet.openCore(TableScanResultSet.java:245)
          at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.openCore(BulkTableScanResultSet.java:248)
          at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(BasicNoPutResultSetImpl.java:255)
          at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:436)
          at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:317)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)
          ... 3 more

          FAILURES!!!
          Tests run: 2, Failures: 1, Errors: 0

          Show
          Kristian Waagan added a comment - I'm still seeing this on a Linux machine: Fedora 15 Linux work2 2.6.38.8-32.fc15.x86_64 #1 SMP Mon Jun 13 19:49:05 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz java version "1.6.0_22" OpenJDK Runtime Environment (IcedTea6 1.10.2) (fedora-58.1.10.2.fc15-x86_64) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) To reproduce I ran two instances of the test in a tight loop in the shell. I've also seen it reproduce when I've used the parallel test runner. Last minute update, it also reproduces on Solaris 11: java version "1.6.0_24" Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) Server VM (build 19.1-b02, mixed mode) Time: 5.035 There was 1 failure: 1) testConcurrentInvalidation(org.apache.derbyTesting.functionTests.tests.lang.CompressTableTest)junit.framework.AssertionFailedError: Helper thread failed at org.apache.derbyTesting.junit.BaseTestCase.fail(BaseTestCase.java:801) at org.apache.derbyTesting.functionTests.tests.lang.CompressTableTest.testConcurrentInvalidation(CompressTableTest.java:121) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:112) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) Caused by: java.sql.SQLException: The conglomerate (2,720) requested does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:98) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:256) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:400) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:348) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2290) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:82) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1334) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeQuery(EmbedPreparedStatement.java:284) at org.apache.derbyTesting.functionTests.tests.lang.CompressTableTest$1.run(CompressTableTest.java:92) Caused by: java.sql.SQLException: The conglomerate (2,720) requested does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:122) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:71) ... 9 more Caused by: ERROR XSAI2: The conglomerate (2,720) requested does not exist. at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:278) at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.readConglomerate(HeapConglomerateFactory.java:254) at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:482) at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:394) at org.apache.derby.impl.store.access.RAMTransaction.getDynamicCompiledConglomInfo(RAMTransaction.java:692) at org.apache.derby.impl.sql.execute.TableScanResultSet.openCore(TableScanResultSet.java:245) at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.openCore(BulkTableScanResultSet.java:248) at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(BasicNoPutResultSetImpl.java:255) at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:436) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:317) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242) ... 3 more FAILURES!!! Tests run: 2, Failures: 1, Errors: 0
          Hide
          Kristian Waagan added a comment -

          Forgot to mention that I used a sane build and ran from classes.

          Show
          Kristian Waagan added a comment - Forgot to mention that I used a sane build and ran from classes.
          Hide
          Knut Anders Hatlen added a comment -

          I've also seen this happen frequently in my test runs on phoneME running inside a VirtualBox instance, so it seems that the fix was incomplete. I'll see if I can get it reproduced in a saner environment and do some debugging.

          Show
          Knut Anders Hatlen added a comment - I've also seen this happen frequently in my test runs on phoneME running inside a VirtualBox instance, so it seems that the fix was incomplete. I'll see if I can get it reproduced in a saner environment and do some debugging.
          Hide
          Knut Anders Hatlen added a comment -

          I managed to reproduce it in my environment and have been running with an instrumented build to try to figure out what's going on. It looks to me as if the problem is that GenericPreparedStatement.makeInvalid() does nothing if the statement is in the process of being recompiled. That may be fine if the recompilation has just started, but if it's in the middle of the compilation, it may end up producing an execution plan that only works with the old conglomerate.

          I've experimented with the attached patch (invalidation-during-compilation.diff). That patch makes GPS.makeInvalid() set a flag if it attempted an invalidation during compilation, and GPS.executeStmt() checks the flag after compilation and requests recompilation if necessary.

          This makes the failure happen less frequently. It appears to completely fix the situations where makeInvalid() used to swallow the invalidation request and the recompilation successfully completed using stale meta-data, which caused the execution of the plan to fail with a conglomerate not found exception. The stack trace Kristian posted was such a case.

          However, there's a variation that's not properly handled by the patch. If the conglomerate not found exception is thrown by the in-progress recompilation itself, because the conglomerate disappeared under its feet, something goes wrong. The patch does make it ignore the exception and recompile once more, the recompilation succeeds, and so does the execution of the plan. But when the transaction in which this happened is committed, the following exception is thrown:

          ERROR X0Y66: Cannot issue commit in a nested connection when there is a pending operation in the parent connection.

          Probably the context stack is in a bad state after the conglomerate not found exception and needs to be restored somehow, but I haven't dug into that yet.

          Show
          Knut Anders Hatlen added a comment - I managed to reproduce it in my environment and have been running with an instrumented build to try to figure out what's going on. It looks to me as if the problem is that GenericPreparedStatement.makeInvalid() does nothing if the statement is in the process of being recompiled. That may be fine if the recompilation has just started, but if it's in the middle of the compilation, it may end up producing an execution plan that only works with the old conglomerate. I've experimented with the attached patch (invalidation-during-compilation.diff). That patch makes GPS.makeInvalid() set a flag if it attempted an invalidation during compilation, and GPS.executeStmt() checks the flag after compilation and requests recompilation if necessary. This makes the failure happen less frequently. It appears to completely fix the situations where makeInvalid() used to swallow the invalidation request and the recompilation successfully completed using stale meta-data, which caused the execution of the plan to fail with a conglomerate not found exception. The stack trace Kristian posted was such a case. However, there's a variation that's not properly handled by the patch. If the conglomerate not found exception is thrown by the in-progress recompilation itself, because the conglomerate disappeared under its feet, something goes wrong. The patch does make it ignore the exception and recompile once more, the recompilation succeeds, and so does the execution of the plan. But when the transaction in which this happened is committed, the following exception is thrown: ERROR X0Y66: Cannot issue commit in a nested connection when there is a pending operation in the parent connection. Probably the context stack is in a bad state after the conglomerate not found exception and needs to be restored somehow, but I haven't dug into that yet.
          Hide
          Knut Anders Hatlen added a comment -

          When I combined the invalidation-during-compilation.diff patch with the patch for DERBY-5161 (which had to be backed out because of DERBY-5280), I didn't see the error on commit anymore. I've been testing the patches by running three parallel processes of the attached Java program D4275.java.

          However, after 12 minutes, one of the test processes failed with a NullPointerException:

          java.lang.NullPointerException
          at org.apache.derby.impl.sql.compile.FromBaseTable.getStoreCostController(FromBaseTable.java:4515)
          at org.apache.derby.impl.sql.compile.FromBaseTable.getBaseCostController(FromBaseTable.java:4521)
          at org.apache.derby.impl.sql.compile.FromBaseTable.estimateCost(FromBaseTable.java:1428)
          at org.apache.derby.impl.sql.compile.OptimizerImpl.estimateTotalCost(OptimizerImpl.java:2626)
          at org.apache.derby.impl.sql.compile.OptimizerImpl.costBasedCostOptimizable(OptimizerImpl.java:2172)
          at org.apache.derby.impl.sql.compile.OptimizerImpl.costOptimizable(OptimizerImpl.java:1985)
          at org.apache.derby.impl.sql.compile.FromBaseTable.optimizeIt(FromBaseTable.java:524)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.optimizeIt(ProjectRestrictNode.java:316)
          at org.apache.derby.impl.sql.compile.OptimizerImpl.costPermutation(OptimizerImpl.java:1939)
          at org.apache.derby.impl.sql.compile.SelectNode.optimize(SelectNode.java:1913)
          at org.apache.derby.impl.sql.compile.DMLStatementNode.optimizeStatement(DMLStatementNode.java:315)
          at org.apache.derby.impl.sql.compile.CursorNode.optimizeStatement(CursorNode.java:587)
          at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:384)
          at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:85)
          at org.apache.derby.impl.sql.GenericPreparedStatement.rePrepare(GenericPreparedStatement.java:231)
          at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:412)
          at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:319)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeQuery(EmbedPreparedStatement.java:284)
          at D4275$1.run0(D4275.java:35)
          at D4275$1.run(D4275.java:23)

          I don't know if this is caused by the fix, or if it's some other existing problem that popped up by coincidence. The other two test processes didn't show any problems, and I terminated them after 30 minutes. Before I included the DERBY-5161 fix, the test would typically fail in less than a minute.

          If DERBY-5161 + invalidation-during-compilation.diff (+ whatever extra is needed to fix the NPE) is the right solution for this issue, we'll need to find out how to reintroduce the DERBY-5161 fix without reintroducing DERBY-5280.

          Show
          Knut Anders Hatlen added a comment - When I combined the invalidation-during-compilation.diff patch with the patch for DERBY-5161 (which had to be backed out because of DERBY-5280 ), I didn't see the error on commit anymore. I've been testing the patches by running three parallel processes of the attached Java program D4275.java. However, after 12 minutes, one of the test processes failed with a NullPointerException: java.lang.NullPointerException at org.apache.derby.impl.sql.compile.FromBaseTable.getStoreCostController(FromBaseTable.java:4515) at org.apache.derby.impl.sql.compile.FromBaseTable.getBaseCostController(FromBaseTable.java:4521) at org.apache.derby.impl.sql.compile.FromBaseTable.estimateCost(FromBaseTable.java:1428) at org.apache.derby.impl.sql.compile.OptimizerImpl.estimateTotalCost(OptimizerImpl.java:2626) at org.apache.derby.impl.sql.compile.OptimizerImpl.costBasedCostOptimizable(OptimizerImpl.java:2172) at org.apache.derby.impl.sql.compile.OptimizerImpl.costOptimizable(OptimizerImpl.java:1985) at org.apache.derby.impl.sql.compile.FromBaseTable.optimizeIt(FromBaseTable.java:524) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.optimizeIt(ProjectRestrictNode.java:316) at org.apache.derby.impl.sql.compile.OptimizerImpl.costPermutation(OptimizerImpl.java:1939) at org.apache.derby.impl.sql.compile.SelectNode.optimize(SelectNode.java:1913) at org.apache.derby.impl.sql.compile.DMLStatementNode.optimizeStatement(DMLStatementNode.java:315) at org.apache.derby.impl.sql.compile.CursorNode.optimizeStatement(CursorNode.java:587) at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:384) at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:85) at org.apache.derby.impl.sql.GenericPreparedStatement.rePrepare(GenericPreparedStatement.java:231) at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:412) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:319) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeQuery(EmbedPreparedStatement.java:284) at D4275$1.run0(D4275.java:35) at D4275$1.run(D4275.java:23) I don't know if this is caused by the fix, or if it's some other existing problem that popped up by coincidence. The other two test processes didn't show any problems, and I terminated them after 30 minutes. Before I included the DERBY-5161 fix, the test would typically fail in less than a minute. If DERBY-5161 + invalidation-during-compilation.diff (+ whatever extra is needed to fix the NPE) is the right solution for this issue, we'll need to find out how to reintroduce the DERBY-5161 fix without reintroducing DERBY-5280 .
          Hide
          Dag H. Wanvik added a comment -

          Interesting find. Sounds like reintroducing a (better) fix for DERBY-5161 might be a useful first step.. The cleanup/context popping machinery is complex, though, so I'm just guessing. Thanks for digging into this.

          Show
          Dag H. Wanvik added a comment - Interesting find. Sounds like reintroducing a (better) fix for DERBY-5161 might be a useful first step.. The cleanup/context popping machinery is complex, though, so I'm just guessing. Thanks for digging into this.
          Hide
          Knut Anders Hatlen added a comment -

          I had my test case running overnight to find out exactly what was causing the NPE (there were two references that could be null on the line number reported). I saw the NPE twice (after two and three hours). It looks like the NPE happens because a conglomerate is not found and a conglomerate descriptor ends up as null. So it sounds like a case where a conglomerate not found exception would be in order. If a StandardException is thrown instead of a NullPointerException, the code knows how to recover (recompile and retry).

          The attached patch (npe.diff) checks if a conglomerate descriptor was found, and raises a StandardException if not.

          Show
          Knut Anders Hatlen added a comment - I had my test case running overnight to find out exactly what was causing the NPE (there were two references that could be null on the line number reported). I saw the NPE twice (after two and three hours). It looks like the NPE happens because a conglomerate is not found and a conglomerate descriptor ends up as null. So it sounds like a case where a conglomerate not found exception would be in order. If a StandardException is thrown instead of a NullPointerException, the code knows how to recover (recompile and retry). The attached patch (npe.diff) checks if a conglomerate descriptor was found, and raises a StandardException if not.
          Hide
          Knut Anders Hatlen added a comment -

          All the regression tests ran cleanly with the npe.diff patch. The patch doesn't have a regression test case, as I don't have a reliable way to reproduce the NPE in reasonable time. I do think however that it can happen in the existing tests CompressTableTest and TruncateTableTest when the timing is right.

          Show
          Knut Anders Hatlen added a comment - All the regression tests ran cleanly with the npe.diff patch. The patch doesn't have a regression test case, as I don't have a reliable way to reproduce the NPE in reasonable time. I do think however that it can happen in the existing tests CompressTableTest and TruncateTableTest when the timing is right.
          Hide
          Knut Anders Hatlen added a comment -

          Committed npe.diff to trunk with revision 1160597.

          Show
          Knut Anders Hatlen added a comment - Committed npe.diff to trunk with revision 1160597.
          Hide
          Knut Anders Hatlen added a comment -

          Merged npe.diff to 10.8 and committed revision 1164499.

          Show
          Knut Anders Hatlen added a comment - Merged npe.diff to 10.8 and committed revision 1164499.
          Hide
          Knut Anders Hatlen added a comment -

          I'm marking the issue as resolved again so that it turns up in the 10.8.2 release notes, since the original fix did solve some of the problems in this area.

          I've filed DERBY-5406 to track the remaining problems revealed by the regression test cases for this bug.

          Show
          Knut Anders Hatlen added a comment - I'm marking the issue as resolved again so that it turns up in the 10.8.2 release notes, since the original fix did solve some of the problems in this area. I've filed DERBY-5406 to track the remaining problems revealed by the regression test cases for this bug.
          Hide
          Kathey Marsden added a comment -

          Reopen for 10.5 backport consideration. If working on the backport for this issue. Temporarily assign yourself and add a comment that you are working on it. When finished, reresolve with the new fix versions or label backport_reject_10_x as appropriate.

          Show
          Kathey Marsden added a comment - Reopen for 10.5 backport consideration. If working on the backport for this issue. Temporarily assign yourself and add a comment that you are working on it. When finished, reresolve with the new fix versions or label backport_reject_10_x as appropriate.
          Hide
          Mike Matrigali added a comment -

          temp assigning issue to myself while working on backport.

          Show
          Mike Matrigali added a comment - temp assigning issue to myself while working on backport.
          Hide
          Kathey Marsden added a comment -

          This was reported by a user 10.3. Updating affects version

          Show
          Kathey Marsden added a comment - This was reported by a user 10.3. Updating affects version
          Hide
          Mike Matrigali added a comment -

          i have finished backporting. backported to 10.7, 10.6, 10.5. 10.4, 10.3. reassigning original owner.

          Show
          Mike Matrigali added a comment - i have finished backporting. backported to 10.7, 10.6, 10.5. 10.4, 10.3. reassigning original owner.
          Hide
          Knut Anders Hatlen added a comment -

          [bulk update] Close all resolved issues that haven't been updated for more than one year.

          Show
          Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Sai Pullabhotla
            • Votes:
              4 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development