Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.6.1.0
    • Fix Version/s: 10.6.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      The SQL standard defines two ways to specify the join condition in an INNER JOIN or a (LEFT/RIGHT/FULL) OUTER JOIN: with an ON clause or with a USING clause. Derby currently only accepts joins with an ON clause. Internally, Derby has code that supports USING. This code should be enabled to ease the migration to Derby. We must also verify that the implementation adheres to the standard before we enable it.

      Since USING is already a reserved keyword in Derby's parser, enabling the USING syntax should not cause any compatibility issues for existing queries.

      1. using.diff
        1.0 kB
        Knut Anders Hatlen
      2. using_v2.diff
        18 kB
        Knut Anders Hatlen
      3. using_v2.stat
        0.5 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          The attached patch enables the USING syntax in the parser.

          It appears to do the right thing in the simple queries below. The columns specified in the USING clause come first in the result, then all the columns in T1 except those in USING, and then all the columns in T2 except those in USING.

          ij> create table t1(a int, b int, c int);
          0 rows inserted/updated/deleted
          ij> insert into t1 values (1,2,3),(2,3,1),(4,4,4);
          3 rows inserted/updated/deleted
          ij> create table t2(b int, c int, d int);
          0 rows inserted/updated/deleted
          ij> insert into t2 values (1,2,3),(2,3,1),(5,5,5);
          3 rows inserted/updated/deleted
          ij> select * from t1 join t2 using (b,c);
          B |C |A |D
          -----------------------------------------------
          2 |3 |1 |1

          1 row selected
          ij> select * from t1 join t2 using (b);
          B |A |C |C |D
          -----------------------------------------------------------
          2 |1 |3 |3 |1

          1 row selected
          ij> select * from t1 right join t2 using (b,c);
          B |C |A |D
          -----------------------------------------------
          1 |2 |NULL |3
          2 |3 |1 |1
          5 |5 |NULL |5

          3 rows selected

          There are some problems that would need to be resolved, though. Some quick tests revealed this:

          1) NullPointerException if none of the columns in the USING clause are shared between the tables that are being joined:

          ij> select * from t1 join t2 using (a,d);
          ERROR XJ001: Java exception: ': java.lang.NullPointerException'.

          2) I'm not sure which behaviour is correct according to the standard, but at least this query differs from PostgreSQL and MySQL in which columns to return:

          ij> select t1., t2. from t1 join t2 using (b,c);
          A |D
          -----------------------
          1 |1

          1 row selected

          PostgreSQL and MySQL return more columns (results below from PostgreSQL):

          kh160127=# select t1., t2. from t1 join t2 using (b,c);
          a | b | c | b | c | d
          ------+--
          1 | 2 | 3 | 2 | 3 | 1
          (1 row)

          Show
          Knut Anders Hatlen added a comment - The attached patch enables the USING syntax in the parser. It appears to do the right thing in the simple queries below. The columns specified in the USING clause come first in the result, then all the columns in T1 except those in USING, and then all the columns in T2 except those in USING. ij> create table t1(a int, b int, c int); 0 rows inserted/updated/deleted ij> insert into t1 values (1,2,3),(2,3,1),(4,4,4); 3 rows inserted/updated/deleted ij> create table t2(b int, c int, d int); 0 rows inserted/updated/deleted ij> insert into t2 values (1,2,3),(2,3,1),(5,5,5); 3 rows inserted/updated/deleted ij> select * from t1 join t2 using (b,c); B |C |A |D ----------------------------------------------- 2 |3 |1 |1 1 row selected ij> select * from t1 join t2 using (b); B |A |C |C |D ----------------------------------------------------------- 2 |1 |3 |3 |1 1 row selected ij> select * from t1 right join t2 using (b,c); B |C |A |D ----------------------------------------------- 1 |2 |NULL |3 2 |3 |1 |1 5 |5 |NULL |5 3 rows selected There are some problems that would need to be resolved, though. Some quick tests revealed this: 1) NullPointerException if none of the columns in the USING clause are shared between the tables that are being joined: ij> select * from t1 join t2 using (a,d); ERROR XJ001: Java exception: ': java.lang.NullPointerException'. 2) I'm not sure which behaviour is correct according to the standard, but at least this query differs from PostgreSQL and MySQL in which columns to return: ij> select t1. , t2. from t1 join t2 using (b,c); A |D ----------------------- 1 |1 1 row selected PostgreSQL and MySQL return more columns (results below from PostgreSQL): kh160127=# select t1. , t2. from t1 join t2 using (b,c); a | b | c | b | c | d -- - - - - + -- 1 | 2 | 3 | 2 | 3 | 1 (1 row)
          Hide
          Dag H. Wanvik added a comment -

          Patch looks simple enough, I don't know about the behavior you mention, I'll go see if I can grok the standard on this.

          Show
          Dag H. Wanvik added a comment - Patch looks simple enough, I don't know about the behavior you mention, I'll go see if I can grok the standard on this.
          Hide
          Dag H. Wanvik added a comment -

          I think Derby is correct in suppressing the join column in the case
          you show above. Consider this stanza from SQl 2003, vol 2, section
          7.12 <query specification>, syntax rule 7 on <asterisked identifier
          chain>, paragraph g), case i):

          "If the basis is a <table or query name> or <correlation name>, then
          let TQ be the table associated with the basis. The <select sublist> is
          equivalent to a <value expression> sequence in which each <value
          expression> is a column reference CR that references a column of TQ
          that is not a common column of a <joined table>. Each column of TQ
          ***************************
          that is not a referenced common column shall be referenced exactly
          once. The columns shall be referenced in the ascending sequence of
          their ordinal positions within TQ."

          Note my underlining. A "common column" (below) is the one used in the USING
          clause, so those columns should be omitted, as does Derby.

          For the definition of "common column", see section 7.7 <joined table>,
          syntax rule 7, paragraphs a) and b):

          "7) If NATURAL is specified or if a <join specification> immediately
          containing a <named columns join> is specified, then:

          a) If NATURAL is specified, then let common column name be a <field
          name> that is equivalent to the <field name> of exactly one field of
          RT1 and the <field name> of exactly one field of RT2. RT1 shall not
          have any duplicate common column names and RT2 shall not have any
          duplicate common column names. Let corresponding join columns refer to
          all fields of RT1 and RT2 that have common column names, if any.

          b) If a <named columns join> is specified, then every <column name>
          in the <join column list> shall be equivalent to the <field name>
          of exactly one field of RT1 and the <field name> of exactly one
          field of RT2. Let common column name be the name of such a
          column. Let corresponding join columns refer to the columns
          identified in the <join column list>."

          So, this suppression also applies for NATURAL JOIN, if we want to
          implement that.

          Show
          Dag H. Wanvik added a comment - I think Derby is correct in suppressing the join column in the case you show above. Consider this stanza from SQl 2003, vol 2, section 7.12 <query specification>, syntax rule 7 on <asterisked identifier chain>, paragraph g), case i): "If the basis is a <table or query name> or <correlation name>, then let TQ be the table associated with the basis. The <select sublist> is equivalent to a <value expression> sequence in which each <value expression> is a column reference CR that references a column of TQ that is not a common column of a <joined table>. Each column of TQ *************************** that is not a referenced common column shall be referenced exactly once. The columns shall be referenced in the ascending sequence of their ordinal positions within TQ." Note my underlining. A "common column" (below) is the one used in the USING clause, so those columns should be omitted, as does Derby. For the definition of "common column", see section 7.7 <joined table>, syntax rule 7, paragraphs a) and b): "7) If NATURAL is specified or if a <join specification> immediately containing a <named columns join> is specified, then: a) If NATURAL is specified, then let common column name be a <field name> that is equivalent to the <field name> of exactly one field of RT1 and the <field name> of exactly one field of RT2. RT1 shall not have any duplicate common column names and RT2 shall not have any duplicate common column names. Let corresponding join columns refer to all fields of RT1 and RT2 that have common column names, if any. b) If a <named columns join> is specified, then every <column name> in the <join column list> shall be equivalent to the <field name> of exactly one field of RT1 and the <field name> of exactly one field of RT2. Let common column name be the name of such a column. Let corresponding join columns refer to the columns identified in the <join column list>." So, this suppression also applies for NATURAL JOIN, if we want to implement that.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for digging into the standard, Dag. I agree that what you quoted above must mean that the behaviour seen with the patch is in accordance with the standard. It's strange, though, that both PostgreSQL and MySQL have interpreted this in another way. Does anyone have access to other DBMSs where they could test this query and see which columns they return?

          Show
          Knut Anders Hatlen added a comment - Thanks for digging into the standard, Dag. I agree that what you quoted above must mean that the behaviour seen with the patch is in accordance with the standard. It's strange, though, that both PostgreSQL and MySQL have interpreted this in another way. Does anyone have access to other DBMSs where they could test this query and see which columns they return?
          Hide
          Knut Anders Hatlen added a comment -

          Uploading an updated patch (v2). Changes from the previous revision:

          • Fixed the NPE when referencing non-common columns in the USING clause. This condition was probably detected earlier originally when the USING code was written and thereby preventing that the code that threw NPE was reached. Now columns out of scope are detected very late in the bind phase (in JoinNode.deferredBindExpressions(), see also DERBY-4284 and DERBY-2916 for other issues related to late binding in JoinNode). The fix was simple: just check the value returned by getResultColumn(columnName) in ResultColumnList.getJoinColumns() and raise a StandardException if it's null.
          • Added various test cases for USING in JoinTest. Verified that PostgreSQL gave the same results (except for the case with asterisks with table qualifiers, as discussed above).
          • Removed negative test case from lang/db2Compatibility.sql.
          • Updated canon for lang/innerjoin.sql because the message (not the SQLState) for a syntax error was changed for a negative test case because of the grammar changes.

          All the regression tests ran cleanly with the patch.

          Show
          Knut Anders Hatlen added a comment - Uploading an updated patch (v2). Changes from the previous revision: Fixed the NPE when referencing non-common columns in the USING clause. This condition was probably detected earlier originally when the USING code was written and thereby preventing that the code that threw NPE was reached. Now columns out of scope are detected very late in the bind phase (in JoinNode.deferredBindExpressions(), see also DERBY-4284 and DERBY-2916 for other issues related to late binding in JoinNode). The fix was simple: just check the value returned by getResultColumn(columnName) in ResultColumnList.getJoinColumns() and raise a StandardException if it's null. Added various test cases for USING in JoinTest. Verified that PostgreSQL gave the same results (except for the case with asterisks with table qualifiers, as discussed above). Removed negative test case from lang/db2Compatibility.sql. Updated canon for lang/innerjoin.sql because the message (not the SQLState) for a syntax error was changed for a negative test case because of the grammar changes. All the regression tests ran cleanly with the patch.
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 823126.

          Show
          Knut Anders Hatlen added a comment - Committed revision 823126.

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development