Derby
  1. Derby
  2. DERBY-4450

GROUP BY in an IN-subquery inside HAVING clause whose select list is subset of group by columns, gives NPE

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.3.1.4, 10.3.2.1, 10.3.3.0, 10.4.1.3, 10.4.2.0, 10.5.1.1, 10.5.2.0, 10.5.3.0
    • Fix Version/s: 10.3.3.1, 10.4.2.1, 10.5.3.1, 10.6.1.0
    • Component/s: SQL
    • Labels:
      None
    • Issue & fix info:
      Repro attached
    • Bug behavior facts:
      Regression

      Description

      Found this while working on DERBY-4397 for a similar case with ORDER BY inside an IN-subquery:

      Repro:

      create table t(i int not null, constraint c unique , j int, k int);
      insert into t values (1,10,1),(2,40,1),(3,45,1),(4,46,1),(5,90,1);
      select sum(j) from t group by i having i in (select i from t group by i,j );

      gives NPE:
      ERROR XJ001: Java exception: ': java.lang.NullPointerException'.
      java.sql.SQLException: Java exception: ': java.lang.NullPointerException'.
      at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95)
      at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:87)
      at org.apache.derby.impl.jdbc.Util.javaException(Util.java:244)
      at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:403)
      at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:398)
      at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346)
      at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2201)
      at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81)
      at org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(EmbedResultSet.java:4338)
      at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.java:467)
      at org.apache.derby.impl.jdbc.EmbedResultSet.next(EmbedResultSet.java:371)
      at org.apache.derby.tools.JDBCDisplayUtil.indent_DisplayResults(JDBCDisplayUtil.java:382)
      at org.apache.derby.tools.JDBCDisplayUtil.indent_DisplayResults(JDBCDisplayUtil.java:338)
      at org.apache.derby.tools.JDBCDisplayUtil.indent_DisplayResults(JDBCDisplayUtil.java:241)
      at org.apache.derby.tools.JDBCDisplayUtil.DisplayResults(JDBCDisplayUtil.java:229)
      at org.apache.derby.impl.tools.ij.utilMain.displayResult(utilMain.java:448)
      at org.apache.derby.impl.tools.ij.utilMain.doCatch(utilMain.java:522)
      at org.apache.derby.impl.tools.ij.utilMain.runScriptGuts(utilMain.java:363)
      at org.apache.derby.impl.tools.ij.utilMain.go(utilMain.java:261)
      at org.apache.derby.impl.tools.ij.Main.go(Main.java:229)
      at org.apache.derby.impl.tools.ij.Main.mainCore(Main.java:184)
      at org.apache.derby.impl.tools.ij.Main.main(Main.java:75)
      at org.apache.derby.tools.ij.main(ij.java:59)
      Caused by: java.sql.SQLException: Java exception: ': java.lang.NullPointerException'.
      at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
      at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:119)
      at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70)
      ... 22 more
      Caused by: java.lang.NullPointerException
      at org.apache.derby.impl.sql.execute.BaseActivation.getColumnFromRow(BaseActivation.java:1451)
      at org.apache.derby.exe.acf81e0010x0125x09b5xf628x000003d4bf801.e5(Unknown Source)
      at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGeneratedClass.java:149)
      at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(ProjectRestrictResultSet.java:268)
      at org.apache.derby.impl.sql.execute.AnyResultSet.getNextRowCore(AnyResultSet.java:171)
      at org.apache.derby.exe.acf81e0010x0125x09b5xf628x000003d4bf801.g0(Unknown Source)
      at org.apache.derby.exe.acf81e0010x0125x09b5xf628x000003d4bf801.e3(Unknown Source)
      at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGeneratedClass.java:145)
      at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(ProjectRestrictResultSet.java:268)
      at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(BasicNoPutResultSetImpl.java:471)
      at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.java:427)

      1. derby-4450.diff
        1 kB
        Dag H. Wanvik
      2. derby-4450b.diff
        3 kB
        Dag H. Wanvik
      3. derby-4450b.stat
        0.2 kB
        Dag H. Wanvik

        Issue Links

          Activity

          Hide
          Dag H. Wanvik added a comment - - edited

          The problem here is that an extra PRN is added for this case in
          SelectNode to project away the unselected group by columns. This PRN
          is added at the end of SelectNode#genProjectRestrict, ca line 1620.

          However, the predicate of the PRN (for the "IN") constructed in
          SubqueryNode#pushNewPredicate as a part of SubqueryNode#preprocess,
          has already happened, and knows nothing of the new PRN added by
          SelectNode for this case.

          The consequence is that the IN predicate at code generation time has a
          result column list corresponding to the inner Select node (later
          becoming a PRN), not the PRN sitting on top of it and thus has a wrong
          idea of what result set number to use at execution time. In this case,
          the subquery's rows are materialized and the SelectNode/PRN result set
          is closed after this is done. Closing clears the current row for that
          result set.

          But since the generated code "thinks" that the result set number for
          the SelectNode/PRN is sitting directly below it, it uses that result
          set number when it synthesizes the materialized UnionResultSet, it
          reuses that result set number for the UnionResultSet,
          cf. BaseActivation#materializeResultSetIfPossible.

          In reality, at run-time, that result set is two levels below the
          Subquery (AnyResultSet in this case). So, when the predicate is
          evaluated it gives the NPE, since it accesses a result set two levels
          below it, which has already been closed by the materialization, and
          would have been wrong anyway, since the projection PRN is necessary
          and the IN's PRN predicate needs to have the correct view of what's under it.

          A possible solution would be to reuse the result column list from the
          original Select in the PRN added by for group by's projection, so the
          the IN predicate would point to the right result set at code
          generation time. I think that trick is used elsewhere.

          Show
          Dag H. Wanvik added a comment - - edited The problem here is that an extra PRN is added for this case in SelectNode to project away the unselected group by columns. This PRN is added at the end of SelectNode#genProjectRestrict, ca line 1620. However, the predicate of the PRN (for the "IN") constructed in SubqueryNode#pushNewPredicate as a part of SubqueryNode#preprocess, has already happened, and knows nothing of the new PRN added by SelectNode for this case. The consequence is that the IN predicate at code generation time has a result column list corresponding to the inner Select node (later becoming a PRN), not the PRN sitting on top of it and thus has a wrong idea of what result set number to use at execution time. In this case, the subquery's rows are materialized and the SelectNode/PRN result set is closed after this is done. Closing clears the current row for that result set. But since the generated code "thinks" that the result set number for the SelectNode/PRN is sitting directly below it, it uses that result set number when it synthesizes the materialized UnionResultSet, it reuses that result set number for the UnionResultSet, cf. BaseActivation#materializeResultSetIfPossible. In reality, at run-time, that result set is two levels below the Subquery (AnyResultSet in this case). So, when the predicate is evaluated it gives the NPE, since it accesses a result set two levels below it, which has already been closed by the materialization, and would have been wrong anyway, since the projection PRN is necessary and the IN's PRN predicate needs to have the correct view of what's under it. A possible solution would be to reuse the result column list from the original Select in the PRN added by for group by's projection, so the the IN predicate would point to the right result set at code generation time. I think that trick is used elsewhere.
          Hide
          Bryan Pendleton added a comment -

          > select sum(j) from t group by i having i in (select i from t group by i,j );

          Is this legal? I thought that a HAVING clause could only mention items in the result column list.

          Show
          Bryan Pendleton added a comment - > select sum(j) from t group by i having i in (select i from t group by i,j ); Is this legal? I thought that a HAVING clause could only mention items in the result column list.
          Hide
          Dag H. Wanvik added a comment -

          Bryan, I thought that the requirement is that the HAVING column be present in the GROUP BY list?
          At least Derby barks if it isn't:

          ERROR 42X24: Column I is referenced in the HAVING clause but is not in the GROUP BY list.

          But I'll have a look, thanks!

          Show
          Dag H. Wanvik added a comment - Bryan, I thought that the requirement is that the HAVING column be present in the GROUP BY list? At least Derby barks if it isn't: ERROR 42X24: Column I is referenced in the HAVING clause but is not in the GROUP BY list. But I'll have a look, thanks!
          Hide
          Dag H. Wanvik added a comment -

          Btw, adding "i" to the column list doesn't change the outcome:

          select sum(j),i from t group by i having i in (select i from t group by i,j );

          gives the same NPE.

          Show
          Dag H. Wanvik added a comment - Btw, adding "i" to the column list doesn't change the outcome: select sum(j),i from t group by i having i in (select i from t group by i,j ); gives the same NPE.
          Hide
          Bryan Pendleton added a comment -

          You're right of course. Column I is the grouping column, and so can certainly be in the HAVING clause.

          Sorry for the red herring.

          Show
          Bryan Pendleton added a comment - You're right of course. Column I is the grouping column, and so can certainly be in the HAVING clause. Sorry for the red herring.
          Hide
          Dag H. Wanvik added a comment -

          Uploading a patch which makes this example work, but keeping the top RCL when adding the PRN for the projection. The GroupByTest still works, running full regressions. Not for commit; I need to add a new test case as well.

          Show
          Dag H. Wanvik added a comment - Uploading a patch which makes this example work, but keeping the top RCL when adding the PRN for the projection. The GroupByTest still works, running full regressions. Not for commit; I need to add a new test case as well.
          Hide
          Dag H. Wanvik added a comment -

          Regressions ran cleanly.

          Show
          Dag H. Wanvik added a comment - Regressions ran cleanly.
          Hide
          Dag H. Wanvik added a comment -

          Attaching derby-4450b which adds a new test case for this bug. Ready for review.

          Show
          Dag H. Wanvik added a comment - Attaching derby-4450b which adds a new test case for this bug. Ready for review.
          Hide
          Dag H. Wanvik added a comment -

          This bug first appeared in 10.3, marking as regression.

          Show
          Dag H. Wanvik added a comment - This bug first appeared in 10.3, marking as regression.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for the thorough explanation, Dag. The fix looks clear and correct as far as I can see. +1 to commit.

          Nit: Using the helper method BaseJDBCTestCase.setAutoCommit() would allow you to skip the call to getConnection() in the test case.

          Show
          Knut Anders Hatlen added a comment - Thanks for the thorough explanation, Dag. The fix looks clear and correct as far as I can see. +1 to commit. Nit: Using the helper method BaseJDBCTestCase.setAutoCommit() would allow you to skip the call to getConnection() in the test case.
          Hide
          Dag H. Wanvik added a comment -

          This regression was introduced by svn 516454 (DERBY-681).

          Show
          Dag H. Wanvik added a comment - This regression was introduced by svn 516454 ( DERBY-681 ).
          Hide
          Dag H. Wanvik added a comment -

          Committed on trunk as svn 882732, including Knut's proposed simplification; thanks for looking at this!

          Show
          Dag H. Wanvik added a comment - Committed on trunk as svn 882732, including Knut's proposed simplification; thanks for looking at this!
          Hide
          Dag H. Wanvik added a comment -

          Backported to 10.3 as svn 882958.
          Backported to 10.4 as svn 882965.
          Backported to 10.5 as svn 882966, closing.

          Show
          Dag H. Wanvik added a comment - Backported to 10.3 as svn 882958. Backported to 10.4 as svn 882965. Backported to 10.5 as svn 882966, closing.

            People

            • Assignee:
              Dag H. Wanvik
              Reporter:
              Dag H. Wanvik
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development