Derby
  1. Derby
  2. DERBY-5406

Intermittent failures in CompressTableTest and TruncateTableTest

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.8.2.2, 10.9.1.0
    • Fix Version/s: 10.8.3.0, 10.9.1.0
    • Component/s: SQL
    • Labels:
      None
    • Bug behavior facts:
      Regression Test Failure

      Description

      The test cases CompressTableTest.testConcurrentInvalidation() and TruncateTableTest.testConcurrentInvalidation() fail intermittently with errors such as:

      ERROR XSAI2: The conglomerate (2,720) requested does not exist.

      The problem has been analyzed in the comments on DERBY-4275, and a patch attached to that issue (invalidation-during-compilation.diff) fixes the underlying race condition. However, that patch only works correctly together with the fix for DERBY-5161, which was backed out because it caused the regression DERBY-5280.

      We will therefore need to find a way to fix DERBY-5161 without reintroducing DERBY-5280 in order to resolve this issue.

      1. CompressAndPrepare.java
        2 kB
        Knut Anders Hatlen
      2. d5406-1a-detect-invalidation-during-compilation.diff
        4 kB
        Knut Anders Hatlen
      3. d5406-1b.diff
        4 kB
        Knut Anders Hatlen
      4. d5406-2a-invalidate-self.diff
        4 kB
        Knut Anders Hatlen
      5. d5406-3a.diff
        3 kB
        Knut Anders Hatlen
      6. d5406-4a-push-retry-logic.diff
        7 kB
        Knut Anders Hatlen
      7. d5406-4a-retry-on-conglomerate-error.diff
        8 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          The attached patch (d5406-1a-detect-invalidation-during-compilation.diff) improves the invalidation-during-compilation.diff patch attached to DERBY-4275 by restoring the state of the context stack before retrying the compilation. This prevents the "Cannot issue commit in a nested connection" errors seen with the original patch.

          Although the patch makes the failures happen less frequently, there still appears to be race conditions in this area. I've seen the following two failures when running the D4275.java repro attached to DERBY-4275:

          1) java.sql.SQLException: The conglomerate (1,136) requested does not exist.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:98)
          at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:256)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:400)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:348)
          at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2290)
          at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:82)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:150)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement20.<init>(EmbedPreparedStatement20.java:82)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement30.<init>(EmbedPreparedStatement30.java:63)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement40.<init>(EmbedPreparedStatement40.java:40)
          at org.apache.derby.jdbc.Driver40.newEmbedPreparedStatement(Driver40.java:107)
          at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1615)
          at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1443)
          at D4275$1.run0(D4275.java:32)
          at D4275$1.run(D4275.java:23)
          Caused by: java.sql.SQLException: The conglomerate (1,136) requested does not exist.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:122)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:71)
          ... 14 more
          Caused by: ERROR XSAI2: The conglomerate (1,136) requested does not exist.
          at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:278)
          at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.readConglomerate(HeapConglomerateFactory.java:254)
          at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:482)
          at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:394)
          at org.apache.derby.impl.store.access.RAMTransaction.getStaticCompiledConglomInfo(RAMTransaction.java:665)
          at org.apache.derby.impl.sql.compile.BaseJoinStrategy.fillInScanArgs1(BaseJoinStrategy.java:100)
          at org.apache.derby.impl.sql.compile.NestedLoopJoinStrategy.getScanArgs(NestedLoopJoinStrategy.java:252)
          at org.apache.derby.impl.sql.compile.FromBaseTable.getScanArguments(FromBaseTable.java:3496)
          at org.apache.derby.impl.sql.compile.FromBaseTable.generateResultSet(FromBaseTable.java:3186)
          at org.apache.derby.impl.sql.compile.FromBaseTable.generate(FromBaseTable.java:3113)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generateMinion(ProjectRestrictNode.java:1382)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generate(ProjectRestrictNode.java:1334)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generateMinion(ProjectRestrictNode.java:1382)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generate(ProjectRestrictNode.java:1334)
          at org.apache.derby.impl.sql.compile.ScrollInsensitiveResultSetNode.generate(ScrollInsensitiveResultSetNode.java:109)
          at org.apache.derby.impl.sql.compile.CursorNode.generate(CursorNode.java:637)
          at org.apache.derby.impl.sql.compile.StatementNode.generate(StatementNode.java:345)
          at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:472)
          at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:93)
          at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(GenericLanguageConnectionContext.java:1103)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:131)
          ... 8 more
          Test stopped after 2927 ms

          This error happens in a different code path, outside of the current retry logic.

          2) java.sql.SQLException: The conglomerate (136,832) requested does not exist.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:98)
          at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:256)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:400)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:348)
          at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2290)
          at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:82)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1334)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeQuery(EmbedPreparedStatement.java:284)
          at D4275$1.run0(D4275.java:35)
          at D4275$1.run(D4275.java:23)
          Caused by: java.sql.SQLException: The conglomerate (136,832) requested does not exist.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:122)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:71)
          ... 10 more
          Caused by: ERROR XSAI2: The conglomerate (136,832) requested does not exist.
          at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:278)
          at org.apache.derby.impl.sql.compile.FromBaseTable.bindNonVTITables(FromBaseTable.java:2352)
          at org.apache.derby.impl.sql.compile.FromList.bindTables(FromList.java:317)
          at org.apache.derby.impl.sql.compile.SelectNode.bindNonVTITables(SelectNode.java:489)
          at org.apache.derby.impl.sql.compile.DMLStatementNode.bindTables(DMLStatementNode.java:199)
          at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(DMLStatementNode.java:137)
          at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(CursorNode.java:253)
          at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:327)
          at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:85)
          at org.apache.derby.impl.sql.GenericPreparedStatement.rePrepare(GenericPreparedStatement.java:231)
          at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:414)
          at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:319)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)
          ... 4 more
          Test stopped after 78573 ms

          This error does go through the code path with the retry logic, but it doesn't trigger a retry, so it looks like the invalidation somehow gets lost.

          Show
          Knut Anders Hatlen added a comment - The attached patch (d5406-1a-detect-invalidation-during-compilation.diff) improves the invalidation-during-compilation.diff patch attached to DERBY-4275 by restoring the state of the context stack before retrying the compilation. This prevents the "Cannot issue commit in a nested connection" errors seen with the original patch. Although the patch makes the failures happen less frequently, there still appears to be race conditions in this area. I've seen the following two failures when running the D4275.java repro attached to DERBY-4275 : 1) java.sql.SQLException: The conglomerate (1,136) requested does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:98) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:256) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:400) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:348) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2290) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:82) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:150) at org.apache.derby.impl.jdbc.EmbedPreparedStatement20.<init>(EmbedPreparedStatement20.java:82) at org.apache.derby.impl.jdbc.EmbedPreparedStatement30.<init>(EmbedPreparedStatement30.java:63) at org.apache.derby.impl.jdbc.EmbedPreparedStatement40.<init>(EmbedPreparedStatement40.java:40) at org.apache.derby.jdbc.Driver40.newEmbedPreparedStatement(Driver40.java:107) at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1615) at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1443) at D4275$1.run0(D4275.java:32) at D4275$1.run(D4275.java:23) Caused by: java.sql.SQLException: The conglomerate (1,136) requested does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:122) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:71) ... 14 more Caused by: ERROR XSAI2: The conglomerate (1,136) requested does not exist. at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:278) at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.readConglomerate(HeapConglomerateFactory.java:254) at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:482) at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:394) at org.apache.derby.impl.store.access.RAMTransaction.getStaticCompiledConglomInfo(RAMTransaction.java:665) at org.apache.derby.impl.sql.compile.BaseJoinStrategy.fillInScanArgs1(BaseJoinStrategy.java:100) at org.apache.derby.impl.sql.compile.NestedLoopJoinStrategy.getScanArgs(NestedLoopJoinStrategy.java:252) at org.apache.derby.impl.sql.compile.FromBaseTable.getScanArguments(FromBaseTable.java:3496) at org.apache.derby.impl.sql.compile.FromBaseTable.generateResultSet(FromBaseTable.java:3186) at org.apache.derby.impl.sql.compile.FromBaseTable.generate(FromBaseTable.java:3113) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generateMinion(ProjectRestrictNode.java:1382) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generate(ProjectRestrictNode.java:1334) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generateMinion(ProjectRestrictNode.java:1382) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generate(ProjectRestrictNode.java:1334) at org.apache.derby.impl.sql.compile.ScrollInsensitiveResultSetNode.generate(ScrollInsensitiveResultSetNode.java:109) at org.apache.derby.impl.sql.compile.CursorNode.generate(CursorNode.java:637) at org.apache.derby.impl.sql.compile.StatementNode.generate(StatementNode.java:345) at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:472) at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:93) at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(GenericLanguageConnectionContext.java:1103) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:131) ... 8 more Test stopped after 2927 ms This error happens in a different code path, outside of the current retry logic. 2) java.sql.SQLException: The conglomerate (136,832) requested does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:98) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:256) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:400) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:348) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2290) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:82) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1334) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeQuery(EmbedPreparedStatement.java:284) at D4275$1.run0(D4275.java:35) at D4275$1.run(D4275.java:23) Caused by: java.sql.SQLException: The conglomerate (136,832) requested does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:122) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:71) ... 10 more Caused by: ERROR XSAI2: The conglomerate (136,832) requested does not exist. at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:278) at org.apache.derby.impl.sql.compile.FromBaseTable.bindNonVTITables(FromBaseTable.java:2352) at org.apache.derby.impl.sql.compile.FromList.bindTables(FromList.java:317) at org.apache.derby.impl.sql.compile.SelectNode.bindNonVTITables(SelectNode.java:489) at org.apache.derby.impl.sql.compile.DMLStatementNode.bindTables(DMLStatementNode.java:199) at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(DMLStatementNode.java:137) at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(CursorNode.java:253) at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:327) at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:85) at org.apache.derby.impl.sql.GenericPreparedStatement.rePrepare(GenericPreparedStatement.java:231) at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:414) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:319) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242) ... 4 more Test stopped after 78573 ms This error does go through the code path with the retry logic, but it doesn't trigger a retry, so it looks like the invalidation somehow gets lost.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a new revision (1b) of the patch that prevents invalidation requests to get lost if a compilation is already in progress. It adds some more comments, and also moves the restoring of the context stack out from the block synchronized on the GPS, since it only modifies state local to the LCC and doesn't need statement-global synchronization.

          Committed revision 1175785.

          Show
          Knut Anders Hatlen added a comment - Attaching a new revision (1b) of the patch that prevents invalidation requests to get lost if a compilation is already in progress. It adds some more comments, and also moves the restoring of the context stack out from the block synchronized on the GPS, since it only modifies state local to the LCC and doesn't need statement-global synchronization. Committed revision 1175785.
          Hide
          Knut Anders Hatlen added a comment -

          Of the two stack traces mentioned above, I see (2) more frequently than (1). (I also sometimes see other stack traces, and I suspect there may be multiple holes.)

          Stack trace (2) is in fact the same problem that caused the NullPointerException fixed in DERBY-4275. The fix made it throw a StandardException instead, so that the retry logic would come into play. In some cases it actually does recover from that error, but apparently not always. Here's what I think is happening in FromBaseTable.bindNonVTITables() when this error occurs:

          1) The statement is in the process of being recompiled, and it builds the table descriptor at line 2190:

          TableDescriptor tableDescriptor = bindTableDescriptor();

          2) The statement's dependency on the table is registered at line 2341:

          /* This represents a table - query is dependent on the TableDescriptor */
          compilerContext.createDependency(tableDescriptor);

          3) It discovers that the conglomerate referred to by the table descriptor no longer exists at line 2351 and raises an exception:

          // Bail out if the descriptor couldn't be found. The conglomerate
          // probably doesn't exist anymore.
          if (baseConglomerateDescriptor == null)

          { throw StandardException.newException( SQLState.STORE_CONGLOMERATE_DOES_NOT_EXIST, new Long(tableDescriptor.getHeapConglomerateId())); }

          Now, the conglomerate disappeared some time after the table descriptor was built, because of a compress or truncate operation. If the dependency on the table had been registered before the conglomerate was removed, the compress/truncate operation will have invalidated the statement, so the retry logic knows it should try again.

          If the compress/truncate operation happened after the table descriptor was built, but before the dependency was registered, the statement will not be invalidated. In that case, the retry logic does not know that an invalidation has occurred, and it won't retry the compilation.

          So it looks like we either need to find a way to close the window between the calls to bindTableDescriptor() and createDependency(), or when this happens the statement should invalidate itself before it throws the exception.

          Show
          Knut Anders Hatlen added a comment - Of the two stack traces mentioned above, I see (2) more frequently than (1). (I also sometimes see other stack traces, and I suspect there may be multiple holes.) Stack trace (2) is in fact the same problem that caused the NullPointerException fixed in DERBY-4275 . The fix made it throw a StandardException instead, so that the retry logic would come into play. In some cases it actually does recover from that error, but apparently not always. Here's what I think is happening in FromBaseTable.bindNonVTITables() when this error occurs: 1) The statement is in the process of being recompiled, and it builds the table descriptor at line 2190: TableDescriptor tableDescriptor = bindTableDescriptor(); 2) The statement's dependency on the table is registered at line 2341: /* This represents a table - query is dependent on the TableDescriptor */ compilerContext.createDependency(tableDescriptor); 3) It discovers that the conglomerate referred to by the table descriptor no longer exists at line 2351 and raises an exception: // Bail out if the descriptor couldn't be found. The conglomerate // probably doesn't exist anymore. if (baseConglomerateDescriptor == null) { throw StandardException.newException( SQLState.STORE_CONGLOMERATE_DOES_NOT_EXIST, new Long(tableDescriptor.getHeapConglomerateId())); } Now, the conglomerate disappeared some time after the table descriptor was built, because of a compress or truncate operation. If the dependency on the table had been registered before the conglomerate was removed, the compress/truncate operation will have invalidated the statement, so the retry logic knows it should try again. If the compress/truncate operation happened after the table descriptor was built, but before the dependency was registered, the statement will not be invalidated. In that case, the retry logic does not know that an invalidation has occurred, and it won't retry the compilation. So it looks like we either need to find a way to close the window between the calls to bindTableDescriptor() and createDependency(), or when this happens the statement should invalidate itself before it throws the exception.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching patch 2a which makes FromBaseTable.bindNonVTITables() invalidate the statement itself when it discovers that the conglomerate has disappeared. That way, if the conglomerate was dropped between buildTableDescriptor() and createDependency() so that the original invalidation was lost, we'll still invalidate the statement and make GenericPreparedStatement.executeStmt() detect that a recompilation is needed.

          I've run four parallel instances of the D4275 test case for 1.5 hours without seeing any instances of stack trace (2) mentioned in an earlier comment. That stack trace usually reproduces in 2 to 5 minutes on the same machine without the patch.

          A very similar stack trace was seen three times in those 1.5 hours. That exception was thrown at the exact same place in FromBaseTable, but the re-compilation had been started at a lower level, from GenericActivationHolder, instead of directly from GenericPreparedStatement.executeStmt().

          I think the reason why it still fails if the compilation was started at a lower level, is that the self-invalidation introduced by this patch is ignored because it happens while the statement is being compiled. This was the exact same problem as the one addressed by the 1b patch. However, the 1b patch only added logic to retry compilations started directly from GenericPreparedStatement.executeStmt(). So it looks like the retry logic from 1b must be enhanced to cover more cases.

          But, in any case, I think the 2a patch is an improvement on its own. It makes the failures happen less frequently, and I haven't noticed any new failures because of it.

          The full regression test suite is currently running. I plan to commit the patch if all the tests pass, and then I'll go on trying to fix the retry logic for the cases that are still missed out.

          Show
          Knut Anders Hatlen added a comment - Attaching patch 2a which makes FromBaseTable.bindNonVTITables() invalidate the statement itself when it discovers that the conglomerate has disappeared. That way, if the conglomerate was dropped between buildTableDescriptor() and createDependency() so that the original invalidation was lost, we'll still invalidate the statement and make GenericPreparedStatement.executeStmt() detect that a recompilation is needed. I've run four parallel instances of the D4275 test case for 1.5 hours without seeing any instances of stack trace (2) mentioned in an earlier comment. That stack trace usually reproduces in 2 to 5 minutes on the same machine without the patch. A very similar stack trace was seen three times in those 1.5 hours. That exception was thrown at the exact same place in FromBaseTable, but the re-compilation had been started at a lower level, from GenericActivationHolder, instead of directly from GenericPreparedStatement.executeStmt(). I think the reason why it still fails if the compilation was started at a lower level, is that the self-invalidation introduced by this patch is ignored because it happens while the statement is being compiled. This was the exact same problem as the one addressed by the 1b patch. However, the 1b patch only added logic to retry compilations started directly from GenericPreparedStatement.executeStmt(). So it looks like the retry logic from 1b must be enhanced to cover more cases. But, in any case, I think the 2a patch is an improvement on its own. It makes the failures happen less frequently, and I haven't noticed any new failures because of it. The full regression test suite is currently running. I plan to commit the patch if all the tests pass, and then I'll go on trying to fix the retry logic for the cases that are still missed out.
          Hide
          Kristian Waagan added a comment -

          Patch 2a looks like a good improvement to me too, Knut Anders. It's clean and concise.

          Just by looking at the patch I have one question: will getCurrentDependent() always return a dependent, or is null a valid return value as well?

          Show
          Kristian Waagan added a comment - Patch 2a looks like a good improvement to me too, Knut Anders. It's clean and concise. Just by looking at the patch I have one question: will getCurrentDependent() always return a dependent, or is null a valid return value as well?
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for looking at the patch, Kristian.

          I think getCurrentDependent() is guaranteed to return a non-null dependent at that location in the code for the following reasons:

          • JavaDoc for CompilerContext.setCurrentDependent() says
          • This should be called at the start of a compile to
          • register who has the dependencies needed for the compilation.

          so it sounds like the compiler context is expected to have a dependent once the compilation has started.

          • CompilerContextImpl.createDependency(Provider), which is called just a few lines before getCurrentDependent() is called, has the following assertion:

          SanityManager.ASSERT(currentDependent != null,
          "no current dependent for compilation");

          (and in insane builds it would fail with a NullPointerException) so we should not come to that point in the code if the dependent was null.

          Show
          Knut Anders Hatlen added a comment - Thanks for looking at the patch, Kristian. I think getCurrentDependent() is guaranteed to return a non-null dependent at that location in the code for the following reasons: JavaDoc for CompilerContext.setCurrentDependent() says This should be called at the start of a compile to register who has the dependencies needed for the compilation. so it sounds like the compiler context is expected to have a dependent once the compilation has started. CompilerContextImpl.createDependency(Provider), which is called just a few lines before getCurrentDependent() is called, has the following assertion: SanityManager.ASSERT(currentDependent != null, "no current dependent for compilation"); (and in insane builds it would fail with a NullPointerException) so we should not come to that point in the code if the dependent was null.
          Hide
          Knut Anders Hatlen added a comment -

          Committed the 2a patch with revision 1187204.

          Show
          Knut Anders Hatlen added a comment - Committed the 2a patch with revision 1187204.
          Hide
          Knut Anders Hatlen added a comment -

          I mentioned that I still saw similar stack traces when the compilation was
          invoked from GenericActivationHolder.execute() instead of
          GenericPreparedStatement.executeStmt(), and suggested that
          GenericActivationHolder needed retry logic similar to the one in
          GenericPreparedStatement.

          The attached patch (d5406-3a.diff) takes a somewhat different approach. Instead
          of adding the extra logic to GenericActivationHolder, it makes
          GenericActivationHolder.execute() stop re-preparing the statement if it detects
          that it's using an outdated generated class. Instead, it just asks the prepared
          statement to give it the most recent version of the generated class.

          In most cases, it will receive an up-to-date version of the class, and it can
          continue without recompiling (the existing code would short-circuit the
          rePrepare() call in that case, so no changes in this scenario).

          If an invalidation happened after the last recompilation of the statement, the
          fresh version of the generated class will also be outdated. With the existing
          code, a recompilation would be requested immediately. With the patch, however,
          we just go ahead executing using the outdated class. The execution code already
          has checks for invalid plans, so it will be detected by the normal execution
          mechanisms. This has the advantage that the invalid plans will be reported in a
          way that GenericPreparedStatement.executeStmt() is able to detect, and the
          recompilation will be done by GenericPreparedStatement.executeStmt(). Since we
          already have the required retry logic in place there, re-invalidation of the
          statement during the recompilation will be detected and handled properly.

          (This is, by the way, the exact same thing as the existing code would do if the
          invalidation had happened right after we had fetched the fresh class. So this
          change could be seen as handling the two cases - invalidation right before
          retrieving the class and invalidation right after retrieving the class -
          uniformly.)

          Another edge case is that the returned generated class could be null. This
          happens if another thread was recompiling the statement when we retrieved the
          class. In that case, the patch makes GenericActivationHolder.execute() throw an
          exception with message id LANG_STATEMENT_NEEDS_RECOMPILE. This is a special
          kind of exception that GenericPreparedStatement.executeStmt() detects and takes
          as a signal to recompile the statement. Again, the recompilation will happen
          using the code that's already prepared for the need to retry in case of
          re-invalidations, so we should be covered if the conglomerate disappears during
          that compilation too.

          This also has the benefit that we can remove the workaround for DERBY-3260,
          where we added a synchronization block around the calls to rePrepare() and
          getActivationClass() to prevent that a concurrent recompilation made
          getActivationClass() return null.

          All the regression tests ran cleanly with the patch.

          I also ran my standard test case, four parallel processes of the D4275 class,
          for two hours without seeing any failures.

          Show
          Knut Anders Hatlen added a comment - I mentioned that I still saw similar stack traces when the compilation was invoked from GenericActivationHolder.execute() instead of GenericPreparedStatement.executeStmt(), and suggested that GenericActivationHolder needed retry logic similar to the one in GenericPreparedStatement. The attached patch (d5406-3a.diff) takes a somewhat different approach. Instead of adding the extra logic to GenericActivationHolder, it makes GenericActivationHolder.execute() stop re-preparing the statement if it detects that it's using an outdated generated class. Instead, it just asks the prepared statement to give it the most recent version of the generated class. In most cases, it will receive an up-to-date version of the class, and it can continue without recompiling (the existing code would short-circuit the rePrepare() call in that case, so no changes in this scenario). If an invalidation happened after the last recompilation of the statement, the fresh version of the generated class will also be outdated. With the existing code, a recompilation would be requested immediately. With the patch, however, we just go ahead executing using the outdated class. The execution code already has checks for invalid plans, so it will be detected by the normal execution mechanisms. This has the advantage that the invalid plans will be reported in a way that GenericPreparedStatement.executeStmt() is able to detect, and the recompilation will be done by GenericPreparedStatement.executeStmt(). Since we already have the required retry logic in place there, re-invalidation of the statement during the recompilation will be detected and handled properly. (This is, by the way, the exact same thing as the existing code would do if the invalidation had happened right after we had fetched the fresh class. So this change could be seen as handling the two cases - invalidation right before retrieving the class and invalidation right after retrieving the class - uniformly.) Another edge case is that the returned generated class could be null. This happens if another thread was recompiling the statement when we retrieved the class. In that case, the patch makes GenericActivationHolder.execute() throw an exception with message id LANG_STATEMENT_NEEDS_RECOMPILE. This is a special kind of exception that GenericPreparedStatement.executeStmt() detects and takes as a signal to recompile the statement. Again, the recompilation will happen using the code that's already prepared for the need to retry in case of re-invalidations, so we should be covered if the conglomerate disappears during that compilation too. This also has the benefit that we can remove the workaround for DERBY-3260 , where we added a synchronization block around the calls to rePrepare() and getActivationClass() to prevent that a concurrent recompilation made getActivationClass() return null. All the regression tests ran cleanly with the patch. I also ran my standard test case, four parallel processes of the D4275 class, for two hours without seeing any failures.
          Hide
          Knut Anders Hatlen added a comment -

          Committed the 3a patch to trunk with revision 1189067.

          Show
          Knut Anders Hatlen added a comment - Committed the 3a patch to trunk with revision 1189067.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a class (CompressAndPrepare.java) that can be used to more easily reproduce the stack trace (1) in the first comment on this issue. This repro repeatedly prepares a query while another thread repeatedly compresses a table used by the query.

          The difference between the stack traces (1) and (2) is that the former happens in prepareStatement(), whereas the latter happens in executeQuery().

          When running the class on an idle system, it can run for a long time without showing the error (at least in my environment). But when putting on some background load, for example by running two instances of the repro concurrently, I get the error within seconds.

          Show
          Knut Anders Hatlen added a comment - Attaching a class (CompressAndPrepare.java) that can be used to more easily reproduce the stack trace (1) in the first comment on this issue. This repro repeatedly prepares a query while another thread repeatedly compresses a table used by the query. The difference between the stack traces (1) and (2) is that the former happens in prepareStatement(), whereas the latter happens in executeQuery(). When running the class on an idle system, it can run for a long time without showing the error (at least in my environment). But when putting on some background load, for example by running two instances of the repro concurrently, I get the error within seconds.
          Hide
          Knut Anders Hatlen added a comment -

          The attached d5406-4a-push-retry-logic.diff patch attempts to fix stack trace (1) by pushing the retry logic further down into GenericStatement.prepare(). That method is used both when the compilation request comes from prepareStatement() and when it comes from the execution.

          I don't see stack trace (1) when I run the CompressAndPrepare repro. I saw a couple occurrences of DERBY-5358, but no other errors.

          All the regression tests passed with the patch.

          More disappointing, I saw an error appear a couple of times when I ran the D4275 repro. However, I also saw this error without the patch, so it looks like an existing hole, and not something caused by this patch. No idea why I didn't see it when I tested the 3a patch in the same environment. Here's the stack trace I saw:

          Caused by: ERROR XSAI2: The conglomerate (20 848) requested does not exist.
          at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:278)
          at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.readConglomerate(HeapConglomerateFactory.java:254)
          at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:482)
          at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:394)
          at org.apache.derby.impl.store.access.RAMTransaction.getStaticCompiledConglomInfo(RAMTransaction.java:665)
          at org.apache.derby.impl.sql.compile.BaseJoinStrategy.fillInScanArgs1(BaseJoinStrategy.java:100)
          at org.apache.derby.impl.sql.compile.NestedLoopJoinStrategy.getScanArgs(NestedLoopJoinStrategy.java:252)
          at org.apache.derby.impl.sql.compile.FromBaseTable.getScanArguments(FromBaseTable.java:3510)
          at org.apache.derby.impl.sql.compile.FromBaseTable.generateResultSet(FromBaseTable.java:3200)
          at org.apache.derby.impl.sql.compile.FromBaseTable.generate(FromBaseTable.java:3127)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generateMinion(ProjectRestrictNode.java:1382)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generate(ProjectRestrictNode.java:1334)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generateMinion(ProjectRestrictNode.java:1382)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generate(ProjectRestrictNode.java:1334)
          at org.apache.derby.impl.sql.compile.ScrollInsensitiveResultSetNode.generate(ScrollInsensitiveResultSetNode.java:109)
          at org.apache.derby.impl.sql.compile.CursorNode.generate(CursorNode.java:637)
          at org.apache.derby.impl.sql.compile.StatementNode.generate(StatementNode.java:345)
          at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:517)
          at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:97)
          at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:85)
          at org.apache.derby.impl.sql.GenericPreparedStatement.rePrepare(GenericPreparedStatement.java:231)
          at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:411)
          at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:319)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)
          ... 4 more

          Show
          Knut Anders Hatlen added a comment - The attached d5406-4a-push-retry-logic.diff patch attempts to fix stack trace (1) by pushing the retry logic further down into GenericStatement.prepare(). That method is used both when the compilation request comes from prepareStatement() and when it comes from the execution. I don't see stack trace (1) when I run the CompressAndPrepare repro. I saw a couple occurrences of DERBY-5358 , but no other errors. All the regression tests passed with the patch. More disappointing, I saw an error appear a couple of times when I ran the D4275 repro. However, I also saw this error without the patch, so it looks like an existing hole, and not something caused by this patch. No idea why I didn't see it when I tested the 3a patch in the same environment. Here's the stack trace I saw: Caused by: ERROR XSAI2: The conglomerate (20 848) requested does not exist. at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:278) at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.readConglomerate(HeapConglomerateFactory.java:254) at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:482) at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:394) at org.apache.derby.impl.store.access.RAMTransaction.getStaticCompiledConglomInfo(RAMTransaction.java:665) at org.apache.derby.impl.sql.compile.BaseJoinStrategy.fillInScanArgs1(BaseJoinStrategy.java:100) at org.apache.derby.impl.sql.compile.NestedLoopJoinStrategy.getScanArgs(NestedLoopJoinStrategy.java:252) at org.apache.derby.impl.sql.compile.FromBaseTable.getScanArguments(FromBaseTable.java:3510) at org.apache.derby.impl.sql.compile.FromBaseTable.generateResultSet(FromBaseTable.java:3200) at org.apache.derby.impl.sql.compile.FromBaseTable.generate(FromBaseTable.java:3127) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generateMinion(ProjectRestrictNode.java:1382) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generate(ProjectRestrictNode.java:1334) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generateMinion(ProjectRestrictNode.java:1382) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generate(ProjectRestrictNode.java:1334) at org.apache.derby.impl.sql.compile.ScrollInsensitiveResultSetNode.generate(ScrollInsensitiveResultSetNode.java:109) at org.apache.derby.impl.sql.compile.CursorNode.generate(CursorNode.java:637) at org.apache.derby.impl.sql.compile.StatementNode.generate(StatementNode.java:345) at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:517) at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:97) at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:85) at org.apache.derby.impl.sql.GenericPreparedStatement.rePrepare(GenericPreparedStatement.java:231) at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:411) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:319) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242) ... 4 more
          Hide
          Knut Anders Hatlen added a comment -

          Setting the patch available flag since the patch passes the regression tests, and the problems still seen with the stress test for this issue are also seen without the patch.

          Show
          Knut Anders Hatlen added a comment - Setting the patch available flag since the patch passes the regression tests, and the problems still seen with the stress test for this issue are also seen without the patch.
          Hide
          Knut Anders Hatlen added a comment -

          Committed d5406-4a-push-retry-logic.diff to trunk revision 1190220.

          Then there's hopefully just one single exception left to fix before we can declare victory on this issue (see comment dated 26/Oct/11 16:35). I added some instrumentation and found that the statement did not have the invalidatedWhileCompiling flag set, which means that it wasn't retried because the invalidation was lost somehow.

          My guess is that its cause is similar to the case fixed by patch 2a. The statement is invalidated after we have built the table descriptor, but before we have registered the statement as a dependent of the table. However, contrary to the case fixed by 2a, the conglomerate isn't actually removed before we fetch the conglomerate descriptor, so the self-invalidation logic we added to the error handling when the conglomerate descriptor is missing, doesn't help in this case. The conglomerate is removed a little later, though, so the compilation will fail, but without the invalidation flag set, the compilation will not be retried.

          I'm wondering if a more robust approach would be to retry the compilation always if it fails because of a missing conglomerate. That's an error that will never be reported to the user unless there's a bug in Derby, I think, so retrying the compilation in those cases shouldn't be a problem. And if we get an error about the same conglomerate missing on the retry, we could report it to prevent infinite loops in case there actually is a problem that must be reported. In the case of a concurrent compress or truncate operation, we should find the new conglomerate when retrying. If the conglomerate is missing because of a drop operation, the recompilation will fail, but it should fail earlier because the system tables have been updated with the correct information, and the error message will be more informative (typically: "Table T does not exist" instead of "Conglomerate X does not exist").

          Show
          Knut Anders Hatlen added a comment - Committed d5406-4a-push-retry-logic.diff to trunk revision 1190220. Then there's hopefully just one single exception left to fix before we can declare victory on this issue (see comment dated 26/Oct/11 16:35). I added some instrumentation and found that the statement did not have the invalidatedWhileCompiling flag set, which means that it wasn't retried because the invalidation was lost somehow. My guess is that its cause is similar to the case fixed by patch 2a. The statement is invalidated after we have built the table descriptor, but before we have registered the statement as a dependent of the table. However, contrary to the case fixed by 2a, the conglomerate isn't actually removed before we fetch the conglomerate descriptor, so the self-invalidation logic we added to the error handling when the conglomerate descriptor is missing, doesn't help in this case. The conglomerate is removed a little later, though, so the compilation will fail, but without the invalidation flag set, the compilation will not be retried. I'm wondering if a more robust approach would be to retry the compilation always if it fails because of a missing conglomerate. That's an error that will never be reported to the user unless there's a bug in Derby, I think, so retrying the compilation in those cases shouldn't be a problem. And if we get an error about the same conglomerate missing on the retry, we could report it to prevent infinite loops in case there actually is a problem that must be reported. In the case of a concurrent compress or truncate operation, we should find the new conglomerate when retrying. If the conglomerate is missing because of a drop operation, the recompilation will fail, but it should fail earlier because the system tables have been updated with the correct information, and the error message will be more informative (typically: "Table T does not exist" instead of "Conglomerate X does not exist").
          Hide
          Knut Anders Hatlen added a comment -

          I experimented with always retrying the compilation if it failed with conglomerate does not exist, see the attached patch d5406-4a-retry-on-conglomerate-error.diff. That patch also backs out the changes in the 2a patch, since the case addressed by that fix will also be covered by the broader fix in the 4a patch.

          I ran four parallel processes of the D4275 repro for almost two hours, two of the processes with the fix and two without the fix. The processes that had the fix only had one occurrence of DERBY-5358, and no other errors. The processes that ran without the fix, had about 30 errors.

          So this fix appears to take care of the remaining issues, at least those I'm able to reproduce. It is however just a workaround for a more fundamental problem with how we track dependencies between statements and conglomerates. I would have felt more comfortable if we found a way to fix the underlying issue that makes invalidation requests vanish. I'll do a little more digging before I give up...

          All regression tests ran cleanly with the 4a patch, except one intermittent failure (DERBY-5498) that didn't show up when rerunning the tests.

          Show
          Knut Anders Hatlen added a comment - I experimented with always retrying the compilation if it failed with conglomerate does not exist, see the attached patch d5406-4a-retry-on-conglomerate-error.diff. That patch also backs out the changes in the 2a patch, since the case addressed by that fix will also be covered by the broader fix in the 4a patch. I ran four parallel processes of the D4275 repro for almost two hours, two of the processes with the fix and two without the fix. The processes that had the fix only had one occurrence of DERBY-5358 , and no other errors. The processes that ran without the fix, had about 30 errors. So this fix appears to take care of the remaining issues, at least those I'm able to reproduce. It is however just a workaround for a more fundamental problem with how we track dependencies between statements and conglomerates. I would have felt more comfortable if we found a way to fix the underlying issue that makes invalidation requests vanish. I'll do a little more digging before I give up... All regression tests ran cleanly with the 4a patch, except one intermittent failure ( DERBY-5498 ) that didn't show up when rerunning the tests.
          Hide
          Mike Matrigali added a comment -

          I have a user with a trigger referencing a bad conglomerate number. With the "right" set of circumstances could this issue cause a trigger to not get recompiled after a compress and be left with the old conglomerate number?

          Show
          Mike Matrigali added a comment - I have a user with a trigger referencing a bad conglomerate number. With the "right" set of circumstances could this issue cause a trigger to not get recompiled after a compress and be left with the old conglomerate number?
          Hide
          Knut Anders Hatlen added a comment -

          Triggers use a different code path for doing the recompiling, but I think they also suffer from the problems with invalidations that get lost if they happen during compilation. I'm not sure if any of the fixes that have gone into this issue would help the trigger case. Probably not, since most of the new code lives in GenericStatement.prepare(), which on first look doesn't seem to be used when preparing a trigger.

          Show
          Knut Anders Hatlen added a comment - Triggers use a different code path for doing the recompiling, but I think they also suffer from the problems with invalidations that get lost if they happen during compilation. I'm not sure if any of the fixes that have gone into this issue would help the trigger case. Probably not, since most of the new code lives in GenericStatement.prepare(), which on first look doesn't seem to be used when preparing a trigger.
          Hide
          Mike Matrigali added a comment -

          saw following in nightly's, 10.8 current branch, windows, ibm15 jvm. This looks like another occurence of this bug.

          http://people.apache.org/~myrnavl/derby_test_results/v10_8/windows/testlog/ibm15/1231438-suites.All_diff.txt

          1) testConcurrentInvalidation(org.apache.derbyTesting.functionTests.tests.lang.TruncateTableTest)junit.framework.AssertionFailedError: Helper thread failed
          at org.apache.derbyTesting.junit.BaseTestCase.fail(BaseTestCase.java:813)
          at org.apache.derbyTesting.functionTests.tests.lang.TruncateTableTest.testConcurrentInvalidation(TruncateTableTest.java:359)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:79)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:113)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          Caused by: java.sql.SQLException: The conglomerate (3696) requested does not exist.
          at org.apache.derby.client.am.SQLExceptionFactory.getSQLException(Unknown Source)
          at org.apache.derby.client.am.SqlException.getSQLException(Unknown Source)
          at org.apache.derby.client.am.PreparedStatement.executeQuery(Unknown Source)
          at org.apache.derbyTesting.functionTests.tests.lang.TruncateTableTest$1.run(TruncateTableTest.java:331)
          Caused by: org.apache.derby.client.am.SqlException: The conglomerate (3696) requested does not exist.
          at org.apache.derby.client.am.Statement.completeSqlca(Unknown Source)
          at org.apache.derby.client.am.Statement.completeOpenQuery(Unknown Source)
          at org.apache.derby.client.net.NetStatementReply.parseOpenQueryFailure(Unknown Source)
          at org.apache.derby.client.net.NetStatementReply.parseOPNQRYreply(Unknown Source)
          at org.apache.derby.client.net.NetStatementReply.readOpenQuery(Unknown Source)
          at org.apache.derby.client.net.StatementReply.readOpenQuery(Unknown Source)
          at org.apache.derby.client.net.NetStatement.readOpenQuery_(Unknown Source)
          at org.apache.derby.client.am.Statement.readOpenQuery(Unknown Source)
          at org.apache.derby.client.am.PreparedStatement.flowExecute(Unknown Source)
          at org.apache.derby.client.am.PreparedStatement.executeQueryX(Unknown Source)
          ... 2 more

          Show
          Mike Matrigali added a comment - saw following in nightly's, 10.8 current branch, windows, ibm15 jvm. This looks like another occurence of this bug. http://people.apache.org/~myrnavl/derby_test_results/v10_8/windows/testlog/ibm15/1231438-suites.All_diff.txt 1) testConcurrentInvalidation(org.apache.derbyTesting.functionTests.tests.lang.TruncateTableTest)junit.framework.AssertionFailedError: Helper thread failed at org.apache.derbyTesting.junit.BaseTestCase.fail(BaseTestCase.java:813) at org.apache.derbyTesting.functionTests.tests.lang.TruncateTableTest.testConcurrentInvalidation(TruncateTableTest.java:359) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:79) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:113) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) Caused by: java.sql.SQLException: The conglomerate (3696) requested does not exist. at org.apache.derby.client.am.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.client.am.SqlException.getSQLException(Unknown Source) at org.apache.derby.client.am.PreparedStatement.executeQuery(Unknown Source) at org.apache.derbyTesting.functionTests.tests.lang.TruncateTableTest$1.run(TruncateTableTest.java:331) Caused by: org.apache.derby.client.am.SqlException: The conglomerate (3696) requested does not exist. at org.apache.derby.client.am.Statement.completeSqlca(Unknown Source) at org.apache.derby.client.am.Statement.completeOpenQuery(Unknown Source) at org.apache.derby.client.net.NetStatementReply.parseOpenQueryFailure(Unknown Source) at org.apache.derby.client.net.NetStatementReply.parseOPNQRYreply(Unknown Source) at org.apache.derby.client.net.NetStatementReply.readOpenQuery(Unknown Source) at org.apache.derby.client.net.StatementReply.readOpenQuery(Unknown Source) at org.apache.derby.client.net.NetStatement.readOpenQuery_(Unknown Source) at org.apache.derby.client.am.Statement.readOpenQuery(Unknown Source) at org.apache.derby.client.am.PreparedStatement.flowExecute(Unknown Source) at org.apache.derby.client.am.PreparedStatement.executeQueryX(Unknown Source) ... 2 more
          Hide
          Mike Matrigali added a comment -

          do you still think we should not checkin your most current patch to this issue? The patch seems good to me in that at least it will reduce errors reported to user. I didn't see much downside with this approach, as it only affects the error
          code path with an added retry.

          I agree it would be best to fix the underlying issue, but at least this patch seems like a good incremental step to make
          the system better until we get there.

          Show
          Mike Matrigali added a comment - do you still think we should not checkin your most current patch to this issue? The patch seems good to me in that at least it will reduce errors reported to user. I didn't see much downside with this approach, as it only affects the error code path with an added retry. I agree it would be best to fix the underlying issue, but at least this patch seems like a good incremental step to make the system better until we get there.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for looking at the patch, Mike. I haven't come any closer to a good solution for the underlying problem, so I agree that it's better to check in the workaround for now. Committed revision 1234776.

          Since multiple fixes have been checked in as part of this issue, and each of them fixed actual problems, I think it is OK to mark this issue as resolved for now. If more problems of similar nature surface, new and more specific bug reports should be filed to get those problems fixed.

          Show
          Knut Anders Hatlen added a comment - Thanks for looking at the patch, Mike. I haven't come any closer to a good solution for the underlying problem, so I agree that it's better to check in the workaround for now. Committed revision 1234776. Since multiple fixes have been checked in as part of this issue, and each of them fixed actual problems, I think it is OK to mark this issue as resolved for now. If more problems of similar nature surface, new and more specific bug reports should be filed to get those problems fixed.
          Hide
          Mike Matrigali added a comment -

          are the fixes associated with this derby issue appropriate for backport, i do see that it is not going to be a simple merge as there are a number of svn commits and some backouts of other changes.

          Any idea how old the issues are, I assume they at least go back as far as 10.5.

          Show
          Mike Matrigali added a comment - are the fixes associated with this derby issue appropriate for backport, i do see that it is not going to be a simple merge as there are a number of svn commits and some backouts of other changes. Any idea how old the issues are, I assume they at least go back as far as 10.5.
          Hide
          Knut Anders Hatlen added a comment -

          I think merging to 10.8 should be straightforward. Not sure how many conflicts to expect when merging further back. I suppose at least DERBY-4275 would have to be merged first.

          I think most of these issues go all the way back, but I haven't checked. Dag verified that the repro for DERBY-4275 failed on 10.4, and that's the repro used to show the problems in this JIRA issue too.

          Show
          Knut Anders Hatlen added a comment - I think merging to 10.8 should be straightforward. Not sure how many conflicts to expect when merging further back. I suppose at least DERBY-4275 would have to be merged first. I think most of these issues go all the way back, but I haven't checked. Dag verified that the repro for DERBY-4275 failed on 10.4, and that's the repro used to show the problems in this JIRA issue too.
          Hide
          Mike Matrigali added a comment -

          I am looking at backporting this fix to 10.8, so setting ownership to myself.

          Show
          Mike Matrigali added a comment - I am looking at backporting this fix to 10.8, so setting ownership to myself.
          Hide
          Mike Matrigali added a comment -

          backported change from trunk to 10.8. resetting original owner.

          To backport farther back, this change may depend on other changes, as is discussed in previous comments of this issue.

          Hope is that this backport will elinate current intermittent failures in the nightly tests against 10.8 as they did for trunk. At this point since this issue is closed any new failures seen in these 2 tests in versions that have the fix should be logged in a new issue and can
          be linked back to this one

          Show
          Mike Matrigali added a comment - backported change from trunk to 10.8. resetting original owner. To backport farther back, this change may depend on other changes, as is discussed in previous comments of this issue. Hope is that this backport will elinate current intermittent failures in the nightly tests against 10.8 as they did for trunk. At this point since this issue is closed any new failures seen in these 2 tests in versions that have the fix should be logged in a new issue and can be linked back to this one

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development