Derby
  1. Derby
  2. DERBY-4380

Subqueries not allowed in ON clause

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.5.3.0
    • Fix Version/s: 10.6.1.0
    • Component/s: SQL
    • Labels:
      None
    • Issue & fix info:
      Release Note Needed, Repro attached

      Description

      SOME is not allowed in ON-clause:

      ij> create table t1 (i integer);
      0 rows inserted/updated/deleted
      ij> create table t2 (i integer);
      0 rows inserted/updated/deleted
      ij> create table t3 (i integer);
      0 rows inserted/updated/deleted
      ij> insert into t1 values (1);
      1 row inserted/updated/deleted
      ij> insert into t2 values (2);
      1 row inserted/updated/deleted
      ij> insert into t3 values 2,3,4;
      3 rows inserted/updated/deleted
      ij> select * from t1 where t1.i = some (select i from t3);
      I
      -----------

      0 rows selected
      ij> select * from t1 inner join t2 on t1.i = some (select i from t3);
      ERROR 42972: An ON clause associated with a JOIN operator is not valid.
      ij>

      1. remove_translations.diff
        14 kB
        Knut Anders Hatlen
      2. releaseNote.html
        5 kB
        Knut Anders Hatlen
      3. on_subquery.diff
        17 kB
        Knut Anders Hatlen
      4. on_subquery_v2.stat
        0.9 kB
        Knut Anders Hatlen
      5. on_subquery_v2.diff
        30 kB
        Knut Anders Hatlen

        Activity

        Bernt M. Johnsen created issue -
        Hide
        Bernt M. Johnsen added a comment -

        Applies also to IN:
        ij> select * from t1 inner join t2 on t1.i in (select i from t3);
        ERROR 42972: An ON clause associated with a JOIN operator is not valid.

        Show
        Bernt M. Johnsen added a comment - Applies also to IN: ij> select * from t1 inner join t2 on t1.i in (select i from t3); ERROR 42972: An ON clause associated with a JOIN operator is not valid.
        Hide
        Knut Anders Hatlen added a comment -

        From JoinNode:

        /* DB2 doesn't allow subquerries in the ON clause */
        if (subqueryList.size() > 0)
        throw StandardException.newException(SQLState.LANG_DB2_ON_CLAUSE_INVALID);

        Show
        Knut Anders Hatlen added a comment - From JoinNode: /* DB2 doesn't allow subquerries in the ON clause */ if (subqueryList.size() > 0) throw StandardException.newException(SQLState.LANG_DB2_ON_CLAUSE_INVALID);
        Hide
        Knut Anders Hatlen added a comment -

        I commented out the code above and ran a couple of small tests, and it appears to do the right thing. So unless the SQL standard prohibits subqueries in ON clauses, I suggest we just remove that restriction and add some test cases to verify the functionality in the regression tests.

        Show
        Knut Anders Hatlen added a comment - I commented out the code above and ran a couple of small tests, and it appears to do the right thing. So unless the SQL standard prohibits subqueries in ON clauses, I suggest we just remove that restriction and add some test cases to verify the functionality in the regression tests.
        Hide
        Knut Anders Hatlen added a comment -

        Attached is a patch (not for commit) that removes the restriction that ON clauses can't contain subqueries. It also adds a test case to JoinTest and updates negative test cases in lang/outerjoin.sql, lang/innerjoin.sql and lang/lojreorder.sql so that they expect results to be returned instead of an exception.

        With this patch, two of the test cases in lang/lojreorder.sql that used to raise ERROR 42972, now raise a NullPointerException. The stack trace looks similar to DERBY-4342. The patch should not be committed until this has been investigated and dealt with.

        Show
        Knut Anders Hatlen added a comment - Attached is a patch (not for commit) that removes the restriction that ON clauses can't contain subqueries. It also adds a test case to JoinTest and updates negative test cases in lang/outerjoin.sql, lang/innerjoin.sql and lang/lojreorder.sql so that they expect results to be returned instead of an exception. With this patch, two of the test cases in lang/lojreorder.sql that used to raise ERROR 42972, now raise a NullPointerException. The stack trace looks similar to DERBY-4342 . The patch should not be committed until this has been investigated and dealt with.
        Knut Anders Hatlen made changes -
        Field Original Value New Value
        Attachment on_subquery.diff [ 12420178 ]
        Hide
        Mamta A. Satoor added a comment -

        I looked at SQL 2003 Foundation spec and through various links, it shows that ON clauses can have subqueries.
        Section 7.7 <joined table> (page 312)
        which shows
        <joined table> ::=
        <cross join>

        <qualified join>
        <natural join>

        On the same page (312), qualified join is defined as
        <qualified join> ::=
        <table reference> [ <join type> ] JOIN <table reference> <join specification>

        Clicking on <join specification> takes to following definition on the same page
        <join specification> ::=
        <join condition>

        <named columns join>

        And <join condition> talks about the ON clause and <search condition> as follows
        <join condition> ::= ON <search condition>

        Clicking on <search condition> takes to page 416. Following the bnf of <search condition> finally leads to <predicate> on page 371 and one of the possibilities for <predicate> is <quantified comparison predicate> and page 397 for <quantified comparison predicate> shows that we can have subquery here with SOME/ANY.

        So, long story short, the change suggested by this jira entry is SQL compatible.

        Show
        Mamta A. Satoor added a comment - I looked at SQL 2003 Foundation spec and through various links, it shows that ON clauses can have subqueries. Section 7.7 <joined table> (page 312) which shows <joined table> ::= <cross join> <qualified join> <natural join> On the same page (312), qualified join is defined as <qualified join> ::= <table reference> [ <join type> ] JOIN <table reference> <join specification> Clicking on <join specification> takes to following definition on the same page <join specification> ::= <join condition> <named columns join> And <join condition> talks about the ON clause and <search condition> as follows <join condition> ::= ON <search condition> Clicking on <search condition> takes to page 416. Following the bnf of <search condition> finally leads to <predicate> on page 371 and one of the possibilities for <predicate> is <quantified comparison predicate> and page 397 for <quantified comparison predicate> shows that we can have subquery here with SOME/ANY. So, long story short, the change suggested by this jira entry is SQL compatible.
        Hide
        Knut Anders Hatlen added a comment -

        Thanks for taking the time to check the standard, Mamta! My understanding is the same as yours. Basically, it looks like the standard allows the same syntactic elements in ON clauses as in WHERE clauses:

        <join condition> ::= ON <search condition>

        <where clause> ::= WHERE <search condition>

        Show
        Knut Anders Hatlen added a comment - Thanks for taking the time to check the standard, Mamta! My understanding is the same as yours. Basically, it looks like the standard allows the same syntactic elements in ON clauses as in WHERE clauses: <join condition> ::= ON <search condition> <where clause> ::= WHERE <search condition>
        Hide
        Knut Anders Hatlen added a comment -

        Here's a stripped down repro for the NPEs seen in the lojreorder test with the patch:

        ij> create table t(x int);
        0 rows inserted/updated/deleted
        ij> insert into t values 1;
        1 row inserted/updated/deleted
        ij> select * from t t1 right join t t2 on 1=1 and t1.x in (select a.x from t a left join t b on 1=0 where a.x=b.x);
        X |X
        -----------------------
        ERROR 38000: The exception 'java.lang.NullPointerException' was thrown while evaluating an expression.
        ERROR XJ001: Java exception: ': java.lang.NullPointerException'.
        ij> select * from t t1 right join t t2 on 1=1 and t1.x >= ALL (select a.x from t a left join t b on 1=0 where a.x=b.x);
        X |X
        -----------------------
        ERROR 38000: The exception 'java.lang.NullPointerException' was thrown while evaluating an expression.
        ERROR XJ001: Java exception: ': java.lang.NullPointerException'.

        Show
        Knut Anders Hatlen added a comment - Here's a stripped down repro for the NPEs seen in the lojreorder test with the patch: ij> create table t(x int); 0 rows inserted/updated/deleted ij> insert into t values 1; 1 row inserted/updated/deleted ij> select * from t t1 right join t t2 on 1=1 and t1.x in (select a.x from t a left join t b on 1=0 where a.x=b.x); X |X ----------------------- ERROR 38000: The exception 'java.lang.NullPointerException' was thrown while evaluating an expression. ERROR XJ001: Java exception: ': java.lang.NullPointerException'. ij> select * from t t1 right join t t2 on 1=1 and t1.x >= ALL (select a.x from t a left join t b on 1=0 where a.x=b.x); X |X ----------------------- ERROR 38000: The exception 'java.lang.NullPointerException' was thrown while evaluating an expression. ERROR XJ001: Java exception: ': java.lang.NullPointerException'.
        Mike Matrigali made changes -
        Component/s SQL [ 11408 ]
        Component/s Store [ 11412 ]
        Mike Matrigali made changes -
        Issue Type Bug [ 1 ] Improvement [ 4 ]
        Hide
        Knut Anders Hatlen added a comment -

        Then NPE can be triggered with an even simpler query:

        create table t(x int);
        insert into t values 1;
        select 1 from t t1 join t t2 on t1.x in
        (select 1 from t a left join t b on 1=0 where b.x=1);

        When I look at the query tree after optimization, it looks as if the result column list for the HalfOuterJoinNode contains a ResultColumn/VirtualColumnNode chain that ends up in a BaseColumn for B.X, which sounds reasonable since B.X is needed for the restriction B.X=1. However, the HalfOuterJoinNode's right child, from which that base column should be retrieved, has a separate chain that ends up in a different BaseColumn for B.X. I believe that in a correct tree, the column chain in the HalfOuterJoinNode should be connected to the chain of its right child, and there should be only one BaseColumn for B.X in the tree.

        Show
        Knut Anders Hatlen added a comment - Then NPE can be triggered with an even simpler query: create table t(x int); insert into t values 1; select 1 from t t1 join t t2 on t1.x in (select 1 from t a left join t b on 1=0 where b.x=1); When I look at the query tree after optimization, it looks as if the result column list for the HalfOuterJoinNode contains a ResultColumn/VirtualColumnNode chain that ends up in a BaseColumn for B.X, which sounds reasonable since B.X is needed for the restriction B.X=1. However, the HalfOuterJoinNode's right child, from which that base column should be retrieved, has a separate chain that ends up in a different BaseColumn for B.X. I believe that in a correct tree, the column chain in the HalfOuterJoinNode should be connected to the chain of its right child, and there should be only one BaseColumn for B.X in the tree.
        Hide
        Knut Anders Hatlen added a comment -

        JoinNode.deferredBindExpressions() binds the expression in the ON
        clause twice, and after the second time it's bound, some parts of the
        tree use the BaseColumnNodes generated the first time, and others use
        those generated the second time. I believe this is the inconsistency
        that leads to a NullPointerException when the statement is executed.

        The double binding is done in order to raise the correct/expected
        exceptions, it seems. If I remove one of them, the NPE goes away, but
        some statements that are supposed to fail change behaviour (either
        they stop failing, or they fail with a different error message,
        depending on which of the two calls to bindExpression() is commented
        out). It seems like the double binding is partly there to harmonize SQL
        states with those raised by DB2.

        Show
        Knut Anders Hatlen added a comment - JoinNode.deferredBindExpressions() binds the expression in the ON clause twice, and after the second time it's bound, some parts of the tree use the BaseColumnNodes generated the first time, and others use those generated the second time. I believe this is the inconsistency that leads to a NullPointerException when the statement is executed. The double binding is done in order to raise the correct/expected exceptions, it seems. If I remove one of them, the NPE goes away, but some statements that are supposed to fail change behaviour (either they stop failing, or they fail with a different error message, depending on which of the two calls to bindExpression() is commented out). It seems like the double binding is partly there to harmonize SQL states with those raised by DB2.
        Knut Anders Hatlen made changes -
        Assignee Knut Anders Hatlen [ knutanders ]
        Knut Anders Hatlen made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Hide
        Knut Anders Hatlen added a comment -

        Changed "SOME" to "Subqueries" in the summary, since the problem is not limited to SOME subqueries.

        Show
        Knut Anders Hatlen added a comment - Changed "SOME" to "Subqueries" in the summary, since the problem is not limited to SOME subqueries.
        Knut Anders Hatlen made changes -
        Summary SOME not allowed in ON clause Subqueries not allowed in ON clause
        Hide
        Knut Anders Hatlen added a comment -

        Here's an updated patch which removes the double binding of the ON
        clause which caused NullPointerExceptions in lang/lojreorder.sql. The
        removal of the double binding changed the behaviour of some more
        queries that used to fail before. As far as I know, the patch only
        affects queries that used to fail. Here's the full list of behavioural
        changes:

        -------------------------------

        1) JOIN queries with a subquery in the ON clause used to fail with the
        following message:
        > ERROR 42972: An ON clause associated with a JOIN operator is not valid.
        Now, those queries will be accepted and return a result.

        2) ON clauses with an unqualified column reference whose name matches
        a column in some other table in the FROM list used to fail. For
        instance:

        ij> create table t(x int);
        0 rows inserted/updated/deleted
        ij> create table tt(y int);
        0 rows inserted/updated/deleted
        ij> select * from t t1, t t2 join tt on x=y;
        ERROR 42X03: Column name 'X' is in more than one table in the FROM list.

        Since T1 is not in the scope of the ON clause, X must be T2.X and
        there is no ambiguity. With the patch, the query does not fail.

        3) If an ON clause referenced a column not in one of the tables that
        are being joined, it used to fail with this error message:

        ij> select * from t t1, t t2 join t t3 on t1.x=t2.x;
        ERROR 42972: An ON clause associated with a JOIN operator is not valid.

        With the patch, the error message is changed into:

        ERROR 42X04: Column 'T1.X' is either not in any table in the FROM list
        or appears within a join specification and is outside the scope of the
        join specification or appears in a HAVING clause and is not in the
        GROUP BY list. If this is a CREATE or ALTER TABLE statement then
        'T1.X' is not a column in the target table.

        -------------------------------

        Both (1) and (2) make our implementation more liberal, and shouldn't
        cause backward-compatibility issues. I've checked all the queries in
        our regression tests that are affected by (1) and (2), and all of them
        now give the same results as PostgreSQL and MySQL. As far as I can
        tell, accepting these queries is in compliance with the
        standard. (Note that we're still more restrictive than PostgreSQL and
        MySQL, since we don't accept correlated columns from outer queries in
        the ON clause, whereas they do.)

        (3) changes the SQLState and the message of a certain failure, which
        could introduce backwards-compatibility issues. However, I don't think
        it's very likely that many applications depend on this particular
        error code, since this error is something you'd expect to see during
        development, before you've got your SQL statements right, and not
        after the application is in production. So I'm leaning towards just
        adding a release note for the changed SQLState, and not trying to
        change the code to use the same SQLState as before.

        I also think the new message is somewhat more useful, since it both
        mentions the name of the problematic column, and that it might be
        outside the scope of the join specification. The old message just said
        that there was something wrong with one of the ON clauses in the
        statement.

        Here are the details from the patch:

        Engine code:

        • JoinNode.java
        • Removed the code that raised an exception if the ON clause contained
          a subquery
        • Removed the first call to joinClause.bindExpression() since its only
          purpose was to make the statements in (2) above fail, and the double
          binding made the query tree inconsistent when the ON clause
          contained subqueries
        • messages.xml
        • Removed now unused message
        • SQLState.java
        • Removed now unused message

        Test code:

        • JoinTest.java:
        • Added a test case for ON with subqueries
        • lojreorder.sql / lojreorder.out , outerjoin.sql / outerjoin.out ,
          innerjoin.sql / innerjoin.out
        • Updated comments and canons for queries that were expected to fail
          before
        • Updated canon for queries whose error message changed
        • dml158.out, dml160.out
        • Updated canon for queries whose error message changed
        Show
        Knut Anders Hatlen added a comment - Here's an updated patch which removes the double binding of the ON clause which caused NullPointerExceptions in lang/lojreorder.sql. The removal of the double binding changed the behaviour of some more queries that used to fail before. As far as I know, the patch only affects queries that used to fail. Here's the full list of behavioural changes: ------------------------------- 1) JOIN queries with a subquery in the ON clause used to fail with the following message: > ERROR 42972: An ON clause associated with a JOIN operator is not valid. Now, those queries will be accepted and return a result. 2) ON clauses with an unqualified column reference whose name matches a column in some other table in the FROM list used to fail. For instance: ij> create table t(x int); 0 rows inserted/updated/deleted ij> create table tt(y int); 0 rows inserted/updated/deleted ij> select * from t t1, t t2 join tt on x=y; ERROR 42X03: Column name 'X' is in more than one table in the FROM list. Since T1 is not in the scope of the ON clause, X must be T2.X and there is no ambiguity. With the patch, the query does not fail. 3) If an ON clause referenced a column not in one of the tables that are being joined, it used to fail with this error message: ij> select * from t t1, t t2 join t t3 on t1.x=t2.x; ERROR 42972: An ON clause associated with a JOIN operator is not valid. With the patch, the error message is changed into: ERROR 42X04: Column 'T1.X' is either not in any table in the FROM list or appears within a join specification and is outside the scope of the join specification or appears in a HAVING clause and is not in the GROUP BY list. If this is a CREATE or ALTER TABLE statement then 'T1.X' is not a column in the target table. ------------------------------- Both (1) and (2) make our implementation more liberal, and shouldn't cause backward-compatibility issues. I've checked all the queries in our regression tests that are affected by (1) and (2), and all of them now give the same results as PostgreSQL and MySQL. As far as I can tell, accepting these queries is in compliance with the standard. (Note that we're still more restrictive than PostgreSQL and MySQL, since we don't accept correlated columns from outer queries in the ON clause, whereas they do.) (3) changes the SQLState and the message of a certain failure, which could introduce backwards-compatibility issues. However, I don't think it's very likely that many applications depend on this particular error code, since this error is something you'd expect to see during development, before you've got your SQL statements right, and not after the application is in production. So I'm leaning towards just adding a release note for the changed SQLState, and not trying to change the code to use the same SQLState as before. I also think the new message is somewhat more useful, since it both mentions the name of the problematic column, and that it might be outside the scope of the join specification. The old message just said that there was something wrong with one of the ON clauses in the statement. Here are the details from the patch: Engine code: JoinNode.java Removed the code that raised an exception if the ON clause contained a subquery Removed the first call to joinClause.bindExpression() since its only purpose was to make the statements in (2) above fail, and the double binding made the query tree inconsistent when the ON clause contained subqueries messages.xml Removed now unused message SQLState.java Removed now unused message Test code: JoinTest.java: Added a test case for ON with subqueries lojreorder.sql / lojreorder.out , outerjoin.sql / outerjoin.out , innerjoin.sql / innerjoin.out Updated comments and canons for queries that were expected to fail before Updated canon for queries whose error message changed dml158.out, dml160.out Updated canon for queries whose error message changed
        Knut Anders Hatlen made changes -
        Attachment on_subquery_v2.stat [ 12425238 ]
        Attachment on_subquery_v2.diff [ 12425239 ]
        Knut Anders Hatlen made changes -
        Issue & fix info [Repro attached] [Patch Available, Repro attached]
        Hide
        Bryan Pendleton added a comment -

        I think that all the behavior changes that you describe are improvements, so no complaints from me for changing them!

        Show
        Bryan Pendleton added a comment - I think that all the behavior changes that you describe are improvements, so no complaints from me for changing them!
        Hide
        Knut Anders Hatlen added a comment -

        Committed revision 882106.

        Setting the flag "Release Note Needed" since we need a release note for the changed SQLState.

        The patch removed one error message, but it didn't remove the translations. I'll post a follow-up patch that removes the translations.

        Show
        Knut Anders Hatlen added a comment - Committed revision 882106. Setting the flag "Release Note Needed" since we need a release note for the changed SQLState. The patch removed one error message, but it didn't remove the translations. I'll post a follow-up patch that removes the translations.
        Knut Anders Hatlen made changes -
        Fix Version/s 10.6.0.0 [ 12313727 ]
        Affects Version/s 10.5.3.0 [ 12314117 ]
        Issue & fix info [Repro attached, Patch Available] [Release Note Needed, Repro attached]
        Hide
        Knut Anders Hatlen added a comment -

        Attached is a follow-up patch which removes the translations of the error message that was removed by the main patch. I have not run the regression tests yet.

        Show
        Knut Anders Hatlen added a comment - Attached is a follow-up patch which removes the translations of the error message that was removed by the main patch. I have not run the regression tests yet.
        Knut Anders Hatlen made changes -
        Attachment remove_translations.diff [ 12427896 ]
        Hide
        Knut Anders Hatlen added a comment -

        All the regression tests ran cleanly with the patch that removed the translated messages.
        Committed revision 890364.

        Show
        Knut Anders Hatlen added a comment - All the regression tests ran cleanly with the patch that removed the translated messages. Committed revision 890364.
        Hide
        Knut Anders Hatlen added a comment -

        Attaching release note.

        Show
        Knut Anders Hatlen added a comment - Attaching release note.
        Knut Anders Hatlen made changes -
        Attachment releaseNote.html [ 12434514 ]
        Knut Anders Hatlen made changes -
        Status In Progress [ 3 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Dag H. Wanvik added a comment -

        Release note looks good.

        Show
        Dag H. Wanvik added a comment - Release note looks good.
        Hide
        Knut Anders Hatlen added a comment -

        [bulk update] Close all resolved issues that haven't been updated for more than one year.

        Show
        Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.
        Knut Anders Hatlen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Gavin made changes -
        Workflow jira [ 12477334 ] Default workflow, editable Closed status [ 12802857 ]

          People

          • Assignee:
            Knut Anders Hatlen
            Reporter:
            Bernt M. Johnsen
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development