Derby
  1. Derby
  2. DERBY-4391

NullPointerException when comparing indexed column with result from a set operation

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.2.2.0, 10.3.3.0, 10.4.2.0, 10.5.3.0, 10.6.1.0
    • Fix Version/s: 10.5.3.1, 10.6.1.0
    • Component/s: SQL
    • Labels:
      None
    • Issue & fix info:
      Repro attached

      Description

      I'm reporting this issue on behalf of Bernt M. Johnsen.

      If an indexed column is compared with a UNION query (or some other set operation), a NullPointerException is raised, as can be seen by this sequence of statements in ij:

      ij> create table t(a int not null primary key, b int);
      0 rows inserted/updated/deleted
      ij> select * from t where a < (values 4 union values 4);
      ERROR XJ001: Java exception: ': java.lang.NullPointerException'.

      1. d4391-1b.diff
        4 kB
        Knut Anders Hatlen
      2. d4391-1a.stat
        0.2 kB
        Knut Anders Hatlen
      3. d4391-1a.diff
        4 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          Merged fix to 10.5 and committed revision 825312.

          Show
          Knut Anders Hatlen added a comment - Merged fix to 10.5 and committed revision 825312.
          Hide
          Dag H. Wanvik added a comment -

          Right, thx. The comment could use an update, then!

          Show
          Dag H. Wanvik added a comment - Right, thx. The comment could use an update, then!
          Hide
          Knut Anders Hatlen added a comment -

          > Btw, I saw this inside setUnionResultExpression
          >
          > /* DB2 requires both sides of union to have same name for the result to
          > * have that name. Otherwise, leave it or set it to a generated name */
          >
          > Wonder what the standard says about this?

          Sounds like it's compliant. See SQL:2003, part 2, 7.13 <query expression>, syntax rules:

          16) If a <query term> immediately contains a set operator, then:
          a) Let C be the <column name> of the i-th column of T1. If the <column name> of the i-th column of T2
          is C, then the <column name> of the i-th column of TR is C; otherwise, the <column name> of the i-th
          column of TR is implementation-dependent.

          Show
          Knut Anders Hatlen added a comment - > Btw, I saw this inside setUnionResultExpression > > /* DB2 requires both sides of union to have same name for the result to > * have that name. Otherwise, leave it or set it to a generated name */ > > Wonder what the standard says about this? Sounds like it's compliant. See SQL:2003, part 2, 7.13 <query expression>, syntax rules: 16) If a <query term> immediately contains a set operator, then: a) Let C be the <column name> of the i-th column of T1. If the <column name> of the i-th column of T2 is C, then the <column name> of the i-th column of TR is C; otherwise, the <column name> of the i-th column of TR is implementation-dependent.
          Hide
          Knut Anders Hatlen added a comment -

          Uploading patch 1b which is the same as 1a plus an extra comment.
          Committed revision 824694.

          Show
          Knut Anders Hatlen added a comment - Uploading patch 1b which is the same as 1a plus an extra comment. Committed revision 824694.
          Hide
          Dag H. Wanvik added a comment -

          > perhaps bulk-fetching is disabled when we have a correlated sub-query in the predicate list?

          I don't know, but even if it is, I think it is more convoluted to rely on that when pruning subquery visits.
          So, your present patch in view of this is fine.

          +1

          Btw, I saw this inside setUnionResultExpression

          /* DB2 requires both sides of union to have same name for the result to

          • have that name. Otherwise, leave it or set it to a generated name */

          Wonder what the standard says about this? (I came across it when playing with Bryan's patch for DERBY-4).

          Show
          Dag H. Wanvik added a comment - > perhaps bulk-fetching is disabled when we have a correlated sub-query in the predicate list? I don't know, but even if it is, I think it is more convoluted to rely on that when pruning subquery visits. So, your present patch in view of this is fine. +1 Btw, I saw this inside setUnionResultExpression /* DB2 requires both sides of union to have same name for the result to have that name. Otherwise, leave it or set it to a generated name */ Wonder what the standard says about this? (I came across it when playing with Bryan's patch for DERBY-4 ).
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for looking at the patch, Dag.

          > I wonder if it is meaningful for the visitor in
          > PredicateList.markReferencedColumns() to go down into subqueries.

          If the sub-query is correlated, I think we'll need to go down into it to see if
          it references any of the columns in the index. Now, it doesn't look like
          markReferencedColumns() is called if we have a correlated sub-query. It is only
          called when bulk-fetching from a non-covering index, and perhaps bulk-fetching
          is disabled when we have a correlated sub-query in the predicate list?

          > At the corresponding place (column reference) in a query tree for
          >
          > select * from (select * from t union select * from t) x
          >
          > I see that the column reference source actually points to a result column of
          > the PRN wrapping the base table in the left query tree of the UNION. (That
          > also seems a bit arbitrary, why not the right?

          I had the same feeling that it was somewhat arbitrary when I looked at the
          logic in ResultColumnList.setUnionResultExpression(). For EXCEPT queries, it
          may make sense, since all the rows in the result come from the left side. For
          UNION and INTERSECT, they could just as well come from the right side, though.

          > If it is sound for the VALUE case to have an empty column reference, I can't
          > answer that, because I don't fully understand how the column references are
          > used in later phases.

          Neither do I. BaseTableNumbersVisitor.visit() says this about the case where
          getSource() returns null:

          if (rc == null)

          { // this can happen if column reference is pointing to a column // that is not from a base table. For example, if we have a // VALUES clause like // // (values (1, 2), (3, 4)) V1 (i, j) // // and then a column reference to VI.i, the column reference // won't have a source. return node; }

          This is actually just a variant of the case we're looking at, since a multi-row
          value statement is rewritten to a union internally. So it's possible to
          reproduce the NPE without explicitly using a set operation, as seen here:

          ij(CONNECTION2)> select * from t where a < (values 1,2,3);
          ERROR XJ001: Java exception: ': java.lang.NullPointerException'.
          ij(CONNECTION2)> select * from t where a < (select max from (values 1,2,3) v);
          ERROR XJ001: Java exception: ': java.lang.NullPointerException'.

          Although I'm still not 100% convinced it is correct that the CR is null in the
          first place, I'm inclined to go for the 1a patch, but with a comment like the
          one in BTNV.visit() added to markReferencedColumns(). Since such a change only
          affects statements that previously raised a NPE, it should at least not make
          the situation worse for queries that used to work.

          Show
          Knut Anders Hatlen added a comment - Thanks for looking at the patch, Dag. > I wonder if it is meaningful for the visitor in > PredicateList.markReferencedColumns() to go down into subqueries. If the sub-query is correlated, I think we'll need to go down into it to see if it references any of the columns in the index. Now, it doesn't look like markReferencedColumns() is called if we have a correlated sub-query. It is only called when bulk-fetching from a non-covering index, and perhaps bulk-fetching is disabled when we have a correlated sub-query in the predicate list? > At the corresponding place (column reference) in a query tree for > > select * from (select * from t union select * from t) x > > I see that the column reference source actually points to a result column of > the PRN wrapping the base table in the left query tree of the UNION. (That > also seems a bit arbitrary, why not the right? I had the same feeling that it was somewhat arbitrary when I looked at the logic in ResultColumnList.setUnionResultExpression(). For EXCEPT queries, it may make sense, since all the rows in the result come from the left side. For UNION and INTERSECT, they could just as well come from the right side, though. > If it is sound for the VALUE case to have an empty column reference, I can't > answer that, because I don't fully understand how the column references are > used in later phases. Neither do I. BaseTableNumbersVisitor.visit() says this about the case where getSource() returns null: if (rc == null) { // this can happen if column reference is pointing to a column // that is not from a base table. For example, if we have a // VALUES clause like // // (values (1, 2), (3, 4)) V1 (i, j) // // and then a column reference to VI.i, the column reference // won't have a source. return node; } This is actually just a variant of the case we're looking at, since a multi-row value statement is rewritten to a union internally. So it's possible to reproduce the NPE without explicitly using a set operation, as seen here: ij(CONNECTION2)> select * from t where a < (values 1,2,3); ERROR XJ001: Java exception: ': java.lang.NullPointerException'. ij(CONNECTION2)> select * from t where a < (select max from (values 1,2,3) v ); ERROR XJ001: Java exception: ': java.lang.NullPointerException'. Although I'm still not 100% convinced it is correct that the CR is null in the first place, I'm inclined to go for the 1a patch, but with a comment like the one in BTNV.visit() added to markReferencedColumns(). Since such a change only affects statements that previously raised a NPE, it should at least not make the situation worse for queries that used to work.
          Hide
          Dag H. Wanvik added a comment -

          I looked at this patch and verified that the patch solved the issue, and that the new test case fails without the patch.
          Thanks!

          I wonder if it is meaningful for the visitor in PredicateList.markReferencedColumns() to go down into subqueries. This
          ColumnReference with the empty source here, stems from inside the subquery, and looking at the logic in FromBaseTable#changeAccessPath, what we are doing here is to collect needed columns from the outer table. I think that means that we do not need to look inside a subquery?

          Changing to this visitor declaration makes the query work too:

          CollectNodesVisitor collectCRs =
          new CollectNodesVisitor(ColumnReference.class,
          SubqueryNode.class);

          As for the question of the empty source,
          looking at the query tree, I see that there is indeed a column reference which has an empty source, which I also
          see, e.g in this query:

          select * from (values 2 union values 2) x

          At the corresponding place (column reference) in a query tree for

          select * from (select * from t union select * from t) x

          I see that the column reference source actually points to a result column of the PRN wrapping the base table in the left query tree of the UNION. (That also seems a bit arbitrary, why not the right? If it is sound for the VALUE case to have an empty column reference, I can't answer that, because I don't fully understand how the column references are used in later phases.

          Show
          Dag H. Wanvik added a comment - I looked at this patch and verified that the patch solved the issue, and that the new test case fails without the patch. Thanks! I wonder if it is meaningful for the visitor in PredicateList.markReferencedColumns() to go down into subqueries. This ColumnReference with the empty source here, stems from inside the subquery, and looking at the logic in FromBaseTable#changeAccessPath, what we are doing here is to collect needed columns from the outer table. I think that means that we do not need to look inside a subquery? Changing to this visitor declaration makes the query work too: CollectNodesVisitor collectCRs = new CollectNodesVisitor(ColumnReference.class, SubqueryNode.class); As for the question of the empty source, looking at the query tree, I see that there is indeed a column reference which has an empty source, which I also see, e.g in this query: select * from (values 2 union values 2) x At the corresponding place (column reference) in a query tree for select * from (select * from t union select * from t) x I see that the column reference source actually points to a result column of the PRN wrapping the base table in the left query tree of the UNION. (That also seems a bit arbitrary, why not the right? If it is sound for the VALUE case to have an empty column reference, I can't answer that, because I don't fully understand how the column references are used in later phases.
          Hide
          Knut Anders Hatlen added a comment -

          It looks like checking whether ref.getSource() is null in PredicateList.markReferencedColumns() and only calling markAllRCsInChainReferenced() on it if it's non-null makes the NPE go away and the queries return the expected results. I'm not sure, but I think it is OK for ColumnReference.getSource() to return null (see for example the comment in BaseTableNumbersVisitor.visit()), in which case markReferencedColumns() should be prepared for it. The ColumnReference with no source is generated by ResultColumnList.setUnionResultExpression().

          The attached patch (d4391-1a.diff) adds the null check and also adds test cases to lang/union.sql. All the regression tests ran cleanly with the patch.

          Show
          Knut Anders Hatlen added a comment - It looks like checking whether ref.getSource() is null in PredicateList.markReferencedColumns() and only calling markAllRCsInChainReferenced() on it if it's non-null makes the NPE go away and the queries return the expected results. I'm not sure, but I think it is OK for ColumnReference.getSource() to return null (see for example the comment in BaseTableNumbersVisitor.visit()), in which case markReferencedColumns() should be prepared for it. The ColumnReference with no source is generated by ResultColumnList.setUnionResultExpression(). The attached patch (d4391-1a.diff) adds the null check and also adds test cases to lang/union.sql. All the regression tests ran cleanly with the patch.
          Hide
          Knut Anders Hatlen added a comment -

          It looks like the NPE is thrown if the left side of the set operation has a column which is not a simple column reference. So the following queries (with expressions or constants on the left side of the union) fail:

          select * from t where a < (select 4 from t except select b from t)
          select * from t where a < (select a+b from t except select 4 from t)
          select * from t where a < (select a+b from t except select a from t)
          select * from t where a < (select sum(a) from t intersect select b from t)

          Whereas these don't raise a NPE:

          select * from t where a < (select a from t intersect all select b from t)
          select * from t where a < (select b from t except select 4 from t)
          select * from t where a < (select a from t except select a+b from t)
          select * from t where a < (select b from t union select sum(a) from t)

          Also worth to notice is that the NPE is not raised if the binary arithmetic operator is =, only if it is <, >, <= or >=.

          Show
          Knut Anders Hatlen added a comment - It looks like the NPE is thrown if the left side of the set operation has a column which is not a simple column reference. So the following queries (with expressions or constants on the left side of the union) fail: select * from t where a < (select 4 from t except select b from t) select * from t where a < (select a+b from t except select 4 from t) select * from t where a < (select a+b from t except select a from t) select * from t where a < (select sum(a) from t intersect select b from t) Whereas these don't raise a NPE: select * from t where a < (select a from t intersect all select b from t) select * from t where a < (select b from t except select 4 from t) select * from t where a < (select a from t except select a+b from t) select * from t where a < (select b from t union select sum(a) from t) Also worth to notice is that the NPE is not raised if the binary arithmetic operator is =, only if it is <, >, <= or >=.
          Hide
          Knut Anders Hatlen added a comment -

          Full stack trace:

          java.lang.NullPointerException
          at org.apache.derby.impl.sql.compile.PredicateList.markReferencedColumns(PredicateList.java:1641)
          at org.apache.derby.impl.sql.compile.FromBaseTable.changeAccessPath(FromBaseTable.java:2957)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.modifyAccessPath(ProjectRestrictNode.java:772)
          at org.apache.derby.impl.sql.compile.OptimizerImpl.modifyAccessPaths(OptimizerImpl.java:2459)
          at org.apache.derby.impl.sql.compile.SelectNode.modifyAccessPaths(SelectNode.java:1867)
          at org.apache.derby.impl.sql.compile.DMLStatementNode.optimizeStatement(DMLStatementNode.java:323)
          at org.apache.derby.impl.sql.compile.CursorNode.optimizeStatement(CursorNode.java:573)
          at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:373)
          at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:88)
          at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(GenericLanguageConnectionContext.java:824)
          at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:606)
          at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:555)
          at org.apache.derby.impl.tools.ij.ij.executeImmediate(ij.java:329)
          at org.apache.derby.impl.tools.ij.utilMain.doCatch(utilMain.java:521)
          at org.apache.derby.impl.tools.ij.utilMain.runScriptGuts(utilMain.java:366)
          at org.apache.derby.impl.tools.ij.utilMain.go(utilMain.java:261)
          at org.apache.derby.impl.tools.ij.Main.go(Main.java:229)
          at org.apache.derby.impl.tools.ij.Main.mainCore(Main.java:184)
          at org.apache.derby.impl.tools.ij.Main.main(Main.java:75)
          at org.apache.derby.tools.ij.main(ij.java:59)
          at org.apache.derby.iapi.tools.run.main(run.java:53)

          Show
          Knut Anders Hatlen added a comment - Full stack trace: java.lang.NullPointerException at org.apache.derby.impl.sql.compile.PredicateList.markReferencedColumns(PredicateList.java:1641) at org.apache.derby.impl.sql.compile.FromBaseTable.changeAccessPath(FromBaseTable.java:2957) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.modifyAccessPath(ProjectRestrictNode.java:772) at org.apache.derby.impl.sql.compile.OptimizerImpl.modifyAccessPaths(OptimizerImpl.java:2459) at org.apache.derby.impl.sql.compile.SelectNode.modifyAccessPaths(SelectNode.java:1867) at org.apache.derby.impl.sql.compile.DMLStatementNode.optimizeStatement(DMLStatementNode.java:323) at org.apache.derby.impl.sql.compile.CursorNode.optimizeStatement(CursorNode.java:573) at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:373) at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:88) at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(GenericLanguageConnectionContext.java:824) at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:606) at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:555) at org.apache.derby.impl.tools.ij.ij.executeImmediate(ij.java:329) at org.apache.derby.impl.tools.ij.utilMain.doCatch(utilMain.java:521) at org.apache.derby.impl.tools.ij.utilMain.runScriptGuts(utilMain.java:366) at org.apache.derby.impl.tools.ij.utilMain.go(utilMain.java:261) at org.apache.derby.impl.tools.ij.Main.go(Main.java:229) at org.apache.derby.impl.tools.ij.Main.mainCore(Main.java:184) at org.apache.derby.impl.tools.ij.Main.main(Main.java:75) at org.apache.derby.tools.ij.main(ij.java:59) at org.apache.derby.iapi.tools.run.main(run.java:53)

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development