Derby
  1. Derby
  2. DERBY-4330

NullPointerException or assert failure when re-executing PreparedStatement after lock timeout

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.3.1.4, 10.3.2.1, 10.3.3.0, 10.4.1.3, 10.4.2.0, 10.5.1.1, 10.5.2.0, 10.6.1.0
    • Fix Version/s: 10.3.3.1, 10.4.2.1, 10.5.3.1, 10.6.1.0
    • Component/s: SQL
    • Labels:
      None
    • Issue & fix info:
      Repro attached
    • Bug behavior facts:
      Regression

      Description

      I came across a query that failed with a NullPointerException (insane jars) or an assert failure (sane jars) when a PreparedStatement was re-executed after a lock timeout. I'm able to reproduce this on 10.3.1.4 and later. 10.2.2.0 and earlier don't fail. Another fallout from DERBY-827? I've also seen other manifestations of the problem, apparently depending on the actual rows in the tables, including "No current connection" and "The heap container with container id Container(0, 1120) is closed".

      Stack trace for the assert failure:

      org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED JoinResultSet already open
      at org.apache.derby.shared.common.sanity.SanityManager.ASSERT(SanityManager.java:120)
      at org.apache.derby.impl.sql.execute.JoinResultSet.openCore(JoinResultSet.java:144)
      at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.openCore(ProjectRestrictResultSet.java:169)
      at org.apache.derby.impl.sql.execute.SortResultSet.openCore(SortResultSet.java:248)
      at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.openCore(ProjectRestrictResultSet.java:169)
      at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(BasicNoPutResultSetImpl.java:248)
      at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:416)
      at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:297)
      at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1235)
      at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1675)
      at org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(EmbedPreparedStatement.java:1330)

      1. derby-4330c.stat
        0.6 kB
        Dag H. Wanvik
      2. derby-4330c.diff
        21 kB
        Dag H. Wanvik
      3. derby-4330b.stat
        0.6 kB
        Dag H. Wanvik
      4. derby-4330b.diff
        21 kB
        Dag H. Wanvik
      5. repro-intersect.sql
        0.9 kB
        Dag H. Wanvik
      6. repro-union.sql
        0.9 kB
        Dag H. Wanvik
      7. derby-4330a.diff
        0.7 kB
        Dag H. Wanvik
      8. repro.sql
        2 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Knut Anders Hatlen created issue -
          Hide
          Knut Anders Hatlen added a comment -

          Here's a script that reproduces the bug. I see this when I run the script with sane trunk:

          $ java -jar derbyrun.jar ij repro.sql
          .
          .
          .
          ij(C1)> execute ps;
          ERROR 40XL1: A lock could not be obtained within the time requested
          ij(C1)> set connection c2;
          ij(C2)> rollback;
          ij(C2)> set connection c1;
          ij(C1)> execute ps;
          ERROR XJ001: Java exception: 'ASSERT FAILED JoinResultSet already open: org.apache.derby.shared.common.sanity.AssertFailure'.

          Show
          Knut Anders Hatlen added a comment - Here's a script that reproduces the bug. I see this when I run the script with sane trunk: $ java -jar derbyrun.jar ij repro.sql . . . ij(C1)> execute ps; ERROR 40XL1: A lock could not be obtained within the time requested ij(C1)> set connection c2; ij(C2)> rollback; ij(C2)> set connection c1; ij(C1)> execute ps; ERROR XJ001: Java exception: 'ASSERT FAILED JoinResultSet already open: org.apache.derby.shared.common.sanity.AssertFailure'.
          Knut Anders Hatlen made changes -
          Field Original Value New Value
          Attachment repro.sql [ 12415136 ]
          Knut Anders Hatlen made changes -
          Issue & fix info [Repro attached]
          Hide
          Knut Anders Hatlen added a comment -

          This started with revision 540921, which was the commit that enabled the DERBY-827 changes.

          Show
          Knut Anders Hatlen added a comment - This started with revision 540921, which was the commit that enabled the DERBY-827 changes.
          Knut Anders Hatlen made changes -
          Link This issue is related to DERBY-827 [ DERBY-827 ]
          Hide
          Dag H. Wanvik added a comment -

          This trial patch makes the repro work (not for commit). I agree result set reuse is the problem here.

          There are at least two JoinResultSets here (1 and 2 below) that are marked as open, even when the parent ProjectRestrictResultSet is not so marked (since openCore failed with the timeout).

          So, at reuse time, the top level ProjectRestrictResult skips closing the underlying JoinResultSet structure, because it itself is not marked opened. The patch works around this by cleaning up children if openCore fails, effectively undoing the setting of the isOpen flag.

          1) The top JoinResultSet's openCore sets its isOpened flag before it attempts to open the children result sets. This triggers the seen error when the result set is reused (opened again).
          2) Even if that is fixed by moving the setting of isOpened to the end of the method, there is another problem: A lower level left result set does open correctly, but right child result set in a join fails with the timeout. The left one is thus already open at reuse openCore time. So just moving the flag setting is not sufficient.

          Show
          Dag H. Wanvik added a comment - This trial patch makes the repro work (not for commit). I agree result set reuse is the problem here. There are at least two JoinResultSets here (1 and 2 below) that are marked as open, even when the parent ProjectRestrictResultSet is not so marked (since openCore failed with the timeout). So, at reuse time, the top level ProjectRestrictResult skips closing the underlying JoinResultSet structure, because it itself is not marked opened. The patch works around this by cleaning up children if openCore fails, effectively undoing the setting of the isOpen flag. 1) The top JoinResultSet's openCore sets its isOpened flag before it attempts to open the children result sets. This triggers the seen error when the result set is reused (opened again). 2) Even if that is fixed by moving the setting of isOpened to the end of the method, there is another problem: A lower level left result set does open correctly, but right child result set in a join fails with the timeout. The left one is thus already open at reuse openCore time. So just moving the flag setting is not sufficient.
          Dag H. Wanvik made changes -
          Attachment derby-4330a.diff [ 12415207 ]
          Dag H. Wanvik made changes -
          Issue & fix info [Repro attached] [Patch Available, Repro attached]
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for looking at this, Dag. I'll just mention that the patch also fixes the "No current connection" (sane) and "The heap container with container id Container(0, 1120) is closed" (insane) errors that I've seen on a database with slightly different rows.

          Show
          Knut Anders Hatlen added a comment - Thanks for looking at this, Dag. I'll just mention that the patch also fixes the "No current connection" (sane) and "The heap container with container id Container(0, 1120) is closed" (insane) errors that I've seen on a database with slightly different rows.
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading a new repro, showing another incarnation of this problem, this time with a UNION query, which relies on the UnionResultSet. That class' openCore has a similar weakness as JoinResultSet's.

          With sane code I see this:
          ERROR XJ001: Java exception: 'ASSERT FAILED UnionResultSet already open: org.apache.derby.shared.common.sanity.AssertFailure'.

          Show
          Dag H. Wanvik added a comment - - edited Uploading a new repro, showing another incarnation of this problem, this time with a UNION query, which relies on the UnionResultSet. That class' openCore has a similar weakness as JoinResultSet's. With sane code I see this: ERROR XJ001: Java exception: 'ASSERT FAILED UnionResultSet already open: org.apache.derby.shared.common.sanity.AssertFailure'.
          Dag H. Wanvik made changes -
          Attachment repro-union.sql [ 12415372 ]
          Hide
          Dag H. Wanvik added a comment -

          Modifying repro-union.sql to use INTERSECT/EXCEPT combined with an ORDER BY where one of the tables is locked, shows the weakness in SetOpResultSet also. Furthermore SortResultSet has the weakness (seen with the UNION case above). I suspect there are more cases of this...

          Show
          Dag H. Wanvik added a comment - Modifying repro-union.sql to use INTERSECT/EXCEPT combined with an ORDER BY where one of the tables is locked, shows the weakness in SetOpResultSet also. Furthermore SortResultSet has the weakness (seen with the UNION case above). I suspect there are more cases of this...
          Dag H. Wanvik made changes -
          Attachment repro-intersect.sql [ 12415378 ]
          Hide
          Dag H. Wanvik added a comment - - edited

          As I suspected, there were more result sets that had to be fixed for this weakness.
          I upload a new patch, derby-4330b, which fixes the following result sets:

          DistinctGroupedAggregateResultSet
          DistinctScalarAggregateResultSet
          GroupedAggregateResultSet
          JoinResultSet
          SetOpResultSet
          SortResultSet
          UnionResultSet

          What they have in common, is that their openCore methods may fail without rolling back any successful opening of underlying result sets. The new code adds rollback by closing the underlying result sets before openCore of the parent is allowed to fail.

          The initial patch's fix to JoinResultSet is remade to be more targeted.

          The patch also adds test cases for these to ResultSetsFromPreparedStatementTest.

          Regressions ran ok on an an earlier version of the patch, running them over again now. Please review.

          Show
          Dag H. Wanvik added a comment - - edited As I suspected, there were more result sets that had to be fixed for this weakness. I upload a new patch, derby-4330b, which fixes the following result sets: DistinctGroupedAggregateResultSet DistinctScalarAggregateResultSet GroupedAggregateResultSet JoinResultSet SetOpResultSet SortResultSet UnionResultSet What they have in common, is that their openCore methods may fail without rolling back any successful opening of underlying result sets. The new code adds rollback by closing the underlying result sets before openCore of the parent is allowed to fail. The initial patch's fix to JoinResultSet is remade to be more targeted. The patch also adds test cases for these to ResultSetsFromPreparedStatementTest. Regressions ran ok on an an earlier version of the patch, running them over again now. Please review.
          Dag H. Wanvik made changes -
          Attachment derby-4330b.diff [ 12415498 ]
          Attachment derby-4330b.stat [ 12415499 ]
          Dag H. Wanvik made changes -
          Assignee Dag H. Wanvik [ dagw ]
          Hide
          Rick Hillegas added a comment -

          Thanks for the patch, Dag. Here's my understanding of the general problem you have found: When a ResultSet encounters an error at open() time, it fails to close its ResultSet children. I wonder if there is another version of this bug hanging around in the ResultSet.close() methods themselves: If an error occurs during the close of one child of a ResultSet, will the other children be closed or will they dangle open? It may be that you will bag more instances of this bug if you do the following:

          1) Make the ResultSet.close() methods more defensive so that they continue closing the rest of their children even if one child raises an error during close().

          2) Have the ResultSet call its own close() method if an error occurs during open().

          Thanks,
          -Rick

          Show
          Rick Hillegas added a comment - Thanks for the patch, Dag. Here's my understanding of the general problem you have found: When a ResultSet encounters an error at open() time, it fails to close its ResultSet children. I wonder if there is another version of this bug hanging around in the ResultSet.close() methods themselves: If an error occurs during the close of one child of a ResultSet, will the other children be closed or will they dangle open? It may be that you will bag more instances of this bug if you do the following: 1) Make the ResultSet.close() methods more defensive so that they continue closing the rest of their children even if one child raises an error during close(). 2) Have the ResultSet call its own close() method if an error occurs during open(). Thanks, -Rick
          Hide
          Dag H. Wanvik added a comment -

          Thanks for looking at this so quickly, Rick!

          > Thanks for the patch, Dag. Here's my understanding of the general
          > problem you have found: When a ResultSet encounters an error at open()
          > time, it fails to close its ResultSet children.

          That is correct.

          > I wonder if there is another version of this bug hanging around in
          > the ResultSet.close() methods themselves: If an error occurs during
          > the close of one child of a ResultSet, will the other children be
          > closed or will they dangle open? It may be that you will bag more
          > instances of this bug if you do the following:
          >
          > 1) Make the ResultSet.close() methods more defensive so that they
          > continue closing the rest of their children even if one child raises
          > an error during close().

          This could be done, sure. However, can you see a scenario where an
          error at close time would be retryable (like time-out at open time) ?

          As the code in the patch currently stands, it does not check if the
          exception is a a retryable one, admitted, it just goes ahead and
          closes the child result set(s). In addition to time-out, I imagine
          dead-lock would be a possible candidate candidate, as well as missing
          permissions (which is why the new defensive code does not check for
          time-out explicitly).

          >
          > 2) Have the ResultSet call its own close() method if an error occurs
          > during open().

          What I did for now, was to make sure the open method does not mark the
          result set as open before all other actions have been successfully
          performed, cf. the change to UnionResultSet. Would calling close() in
          addition add value to my approach?
          (I was hesitant to apply to much "power", here without understanding if it's needed

          Perhaps, for those result sets which also make calls to getNextRowCore before they complete openCore (e.g. GroupedAggregateResultSet) need some extra logic
          like that typically contained in close, like e.g. clearCurrentRow, i.e. there is more state to roll back than just closing the underlying result sets. I'll have a look..

          Show
          Dag H. Wanvik added a comment - Thanks for looking at this so quickly, Rick! > Thanks for the patch, Dag. Here's my understanding of the general > problem you have found: When a ResultSet encounters an error at open() > time, it fails to close its ResultSet children. That is correct. > I wonder if there is another version of this bug hanging around in > the ResultSet.close() methods themselves: If an error occurs during > the close of one child of a ResultSet, will the other children be > closed or will they dangle open? It may be that you will bag more > instances of this bug if you do the following: > > 1) Make the ResultSet.close() methods more defensive so that they > continue closing the rest of their children even if one child raises > an error during close(). This could be done, sure. However, can you see a scenario where an error at close time would be retryable (like time-out at open time) ? As the code in the patch currently stands, it does not check if the exception is a a retryable one, admitted, it just goes ahead and closes the child result set(s). In addition to time-out, I imagine dead-lock would be a possible candidate candidate, as well as missing permissions (which is why the new defensive code does not check for time-out explicitly). > > 2) Have the ResultSet call its own close() method if an error occurs > during open(). What I did for now, was to make sure the open method does not mark the result set as open before all other actions have been successfully performed, cf. the change to UnionResultSet. Would calling close() in addition add value to my approach? (I was hesitant to apply to much "power", here without understanding if it's needed Perhaps, for those result sets which also make calls to getNextRowCore before they complete openCore (e.g. GroupedAggregateResultSet) need some extra logic like that typically contained in close, like e.g. clearCurrentRow, i.e. there is more state to roll back than just closing the underlying result sets. I'll have a look..
          Hide
          Dag H. Wanvik added a comment -

          I upload derby-4330c, which uses the result sets' own close methods instead of
          directly calling close on the underlying result sets, addressing Rick's point 2.

          This should clean up any additional state, provided the close methods work correctly in this (new) "half-opened" context. In my repro tests, the close methods did the job and did not throw. I did inspect the result sets that have additional state and it seems this should work OK provided the underlying close methods don't throw, but I am not absolutely sure (Rick's point 1 is still not addressed in the code).

          Show
          Dag H. Wanvik added a comment - I upload derby-4330c, which uses the result sets' own close methods instead of directly calling close on the underlying result sets, addressing Rick's point 2. This should clean up any additional state, provided the close methods work correctly in this (new) "half-opened" context. In my repro tests, the close methods did the job and did not throw. I did inspect the result sets that have additional state and it seems this should work OK provided the underlying close methods don't throw, but I am not absolutely sure (Rick's point 1 is still not addressed in the code).
          Dag H. Wanvik made changes -
          Attachment derby-4330c.diff [ 12415536 ]
          Attachment derby-4330c.stat [ 12415537 ]
          Hide
          Rick Hillegas added a comment -

          Thanks for the new patch, Dag. This looks simpler.

          Show
          Rick Hillegas added a comment - Thanks for the new patch, Dag. This looks simpler.
          Hide
          Dag H. Wanvik added a comment -

          I will commit this shortly if there are no further review comments.

          Show
          Dag H. Wanvik added a comment - I will commit this shortly if there are no further review comments.
          Hide
          Dag H. Wanvik added a comment -

          Commited derby-4330c on trunk as svn 804271.

          Show
          Dag H. Wanvik added a comment - Commited derby-4330c on trunk as svn 804271.
          Dag H. Wanvik made changes -
          Issue & fix info [Repro attached, Patch Available] [Repro attached]
          Hide
          Dag H. Wanvik added a comment - - edited

          Backported to 10.5 branch as svn 804279 and 804281.

          [2009-08-21: Knut found an accidental file committed. Fixed as svn 806401 and 806403]

          Show
          Dag H. Wanvik added a comment - - edited Backported to 10.5 branch as svn 804279 and 804281. [2009-08-21: Knut found an accidental file committed. Fixed as svn 806401 and 806403]
          Hide
          Dag H. Wanvik added a comment -

          Backported to 10.4 branch as svn 804327

          Show
          Dag H. Wanvik added a comment - Backported to 10.4 branch as svn 804327
          Hide
          Dag H. Wanvik added a comment -

          Backported to 10.3 branch as svn 804334. resolving.

          Show
          Dag H. Wanvik added a comment - Backported to 10.3 branch as svn 804334. resolving.
          Dag H. Wanvik made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 10.3.4.0 [ 12313653 ]
          Fix Version/s 10.4.3.0 [ 12313654 ]
          Fix Version/s 10.5.3.1 [ 12314154 ]
          Fix Version/s 10.6.0.0 [ 12313727 ]
          Resolution Fixed [ 1 ]
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for fixing this, Dag. The back-port to 10.5 made some changes to DatabaseMetaDataTest (disabled some test cases and added calls to System.err.println()). Were these intentional?

          Show
          Knut Anders Hatlen added a comment - Thanks for fixing this, Dag. The back-port to 10.5 made some changes to DatabaseMetaDataTest (disabled some test cases and added calls to System.err.println()). Were these intentional?
          Hide
          Knut Anders Hatlen added a comment -

          It also made some changes to DRDAConnThread.

          Show
          Knut Anders Hatlen added a comment - It also made some changes to DRDAConnThread.
          Hide
          Dag H. Wanvik added a comment -

          Probably not. I probably had forgotten to sanitize my 10.5 sandbox
          Thanks for catching that, Knut, I'll have a look.

          Show
          Dag H. Wanvik added a comment - Probably not. I probably had forgotten to sanitize my 10.5 sandbox Thanks for catching that, Knut, I'll have a look.
          Hide
          Dag H. Wanvik added a comment -

          Fixed the unintentional commit of cruft in DatabaseMetaDataTest to 10.5. The DRDAConnThread
          cruft had been fixed already. Thanks for noticing, Knut.
          Fixed as svn 806401 and 806403.

          Show
          Dag H. Wanvik added a comment - Fixed the unintentional commit of cruft in DatabaseMetaDataTest to 10.5. The DRDAConnThread cruft had been fixed already. Thanks for noticing, Knut. Fixed as svn 806401 and 806403.
          Dag H. Wanvik made changes -
          Fix Version/s 10.5.3.1 [ 12314182 ]
          Hide
          Knut Anders Hatlen added a comment -

          Verified fix on trunk. Closing.

          Show
          Knut Anders Hatlen added a comment - Verified fix on trunk. Closing.
          Knut Anders Hatlen made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Dag H. Wanvik made changes -
          Fix Version/s 10.5.4.0 [ 12314154 ]
          Dag H. Wanvik made changes -
          Fix Version/s 10.4.2.1 [ 12313401 ]
          Fix Version/s 10.4.3.0 [ 12313654 ]
          Dag H. Wanvik made changes -
          Fix Version/s 10.3.3.1 [ 12313143 ]
          Fix Version/s 10.3.4.0 [ 12313653 ]
          Gavin made changes -
          Workflow jira [ 12472056 ] Default workflow, editable Closed status [ 12799036 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          14d 1h 22m 1 Dag H. Wanvik 14/Aug/09 19:31
          Resolved Resolved Closed Closed
          171d 14h 6m 1 Knut Anders Hatlen 02/Feb/10 09:37

            People

            • Assignee:
              Dag H. Wanvik
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development