Derby
  1. Derby
  2. DERBY-5613

Queries with group by column not included in the column list for JOIN(INNER or OUTER) with NATURAL or USING does not fail

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 10.8.2.2
    • Fix Version/s: None
    • Component/s: SQL
    • Urgency:
      Normal
    • Issue & fix info:
      Release Note Needed, Repro attached

      Description

      A query like following does not raise an error even though countries.country is not part of the SELECT column list.
      SELECT country,count(country) FROM
      COUNTRIES JOIN CITIES USING (COUNTRY) group by countries.country

      This jira is related to DERBY-4631. As noted by Knut in DERBY-4631, SQL:2003 says that the join columns in a natural join or in a named columns join should be added to the select list by coalescing the column from the left table with the column from the right table.

      Section 7.7, <joined table>, syntax rules:

      > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
      > reference> or <table factor> that is the second operand of the
      > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
      > respectively. Let TA and TB be the range variables of TR1 and TR2,
      > respectively. (...)

      and

      > 7) If NATURAL is specified or if a <join specification> immediately
      > containing a <named columns join> is specified, then:
      (...)
      > d) If there is at least one corresponding join column, then let SLCC
      > be a <select list> of <derived column>s of the form
      >
      > COALESCE ( TA.C, TB.C ) AS C
      >
      > for every column C that is a corresponding join column, taken in
      > order of their ordinal positions in RT1.

      Derby has it's on logic to retrieve the join column values. It always picks up join column's value from the left table when we are working with natural left outer join and it picks up the join column's value from the right table when we are working with natural right outer join. But this logic does not work for all cases for right outer join. The fix being worked for DERBY-4631 is to pick the join column's value based on following logic
      1)if the left table's column value is null then pick up the right table's column's value.
      2)If the left table's column value is non-null, then pick up that value

      Although this new logic will in essence implement what adding a COALESCE function for a join colunm might have done but it still allows following query to compile and run
      SELECT country,count(country) FROM
      COUNTRIES JOIN CITIES USING (COUNTRY) group by countries.country

      I think query above succeeds because in case of an INNER JOIN or LEFT OUTER JOIN, Derby associates the join column with the left table during it's bind phase. In case of RIGHT OUTER JOIN, Derby associates the join column with right table during it's bind phase. I believe, for these reasons, a query like above will not give an error for the group by column.

        Issue Links

          Activity

          Gavin made changes -
          Workflow jira [ 12653162 ] Default workflow, editable Closed status [ 12796805 ]
          Mamta A. Satoor made changes -
          Labels derby_triage10_9
          Urgency Normal [ 10052 ]
          Issue & fix info Release Note Needed,Repro attached [ 10101,10424 ]
          Hide
          Bryan Pendleton added a comment -

          Thanks! That's a good example, and clear. I'll go read the discussion in DERBY-4631 more carefully.

          Show
          Bryan Pendleton added a comment - Thanks! That's a good example, and clear. I'll go read the discussion in DERBY-4631 more carefully.
          Hide
          Knut Anders Hatlen added a comment -

          > If it's an inner join, aren't countries.country and cities.country equivalent? Is this issue only for outer joins (left or right)?

          This is an issue for inner joins too, I think, since you could have case-insensitive collation, so that countries.country is, say, 'Oman' and cities.country is 'OMAN', and countries.country=cities.country will still evaluate to true.

          > I'm not terribly familiar with USING syntax; is there a semantic difference between
          >
          > SELECT country,count(country) FROM COUNTRIES JOIN CITIES USING (COUNTRY) group by country
          >
          > and
          >
          > SELECT country,count(country) FROM COUNTRIES, cities where countries.country = cities.country group by country

          Yes, those two queries are not equivalent. The latter of these will raise a compile-time error: Column name 'COUNTRY' is in more than one table in the FROM list.

          The former is (supposed to be) equivalent to:

          SELECT country, count(country) from (SELECT COALESCE(countries.country, cities.country) AS country FROM countries, cities WHERE countries.country = cities.country) s GROUP BY country;

          Show
          Knut Anders Hatlen added a comment - > If it's an inner join, aren't countries.country and cities.country equivalent? Is this issue only for outer joins (left or right)? This is an issue for inner joins too, I think, since you could have case-insensitive collation, so that countries.country is, say, 'Oman' and cities.country is 'OMAN', and countries.country=cities.country will still evaluate to true. > I'm not terribly familiar with USING syntax; is there a semantic difference between > > SELECT country,count(country) FROM COUNTRIES JOIN CITIES USING (COUNTRY) group by country > > and > > SELECT country,count(country) FROM COUNTRIES, cities where countries.country = cities.country group by country Yes, those two queries are not equivalent. The latter of these will raise a compile-time error: Column name 'COUNTRY' is in more than one table in the FROM list. The former is (supposed to be) equivalent to: SELECT country, count(country) from (SELECT COALESCE(countries.country, cities.country) AS country FROM countries, cities WHERE countries.country = cities.country) s GROUP BY country;
          Hide
          Bryan Pendleton added a comment -

          If it's an inner join, aren't countries.country and cities.country equivalent? Is this issue only for outer joins (left or right)?

          I agree that we need to obey the sql spec's description of how the query should behave.

          I'm not terribly familiar with USING syntax; is there a semantic difference between

          SELECT country,count(country) FROM COUNTRIES JOIN CITIES USING (COUNTRY) group by country

          and

          SELECT country,count(country) FROM COUNTRIES, cities where countries.country = cities.country group by country

          Show
          Bryan Pendleton added a comment - If it's an inner join, aren't countries.country and cities.country equivalent? Is this issue only for outer joins (left or right)? I agree that we need to obey the sql spec's description of how the query should behave. I'm not terribly familiar with USING syntax; is there a semantic difference between SELECT country,count(country) FROM COUNTRIES JOIN CITIES USING (COUNTRY) group by country and SELECT country,count(country) FROM COUNTRIES, cities where countries.country = cities.country group by country
          Hide
          Mamta A. Satoor added a comment -

          But we haven't qualified the country column with the left table or right table. Instead it is a join column which as per SQL spec should be replaced by COALESCE, I haven't looked at the spec in detail on this, but if country in SELECT is to be replaced with COALESCE(COUNTRIES.county, CITIES.country), then should we be allowing join column country in SELECT to be bound to the COUNTRIES table? When we decide to work on this issue, we probably should look into SQL spec more about the expected behavior for such a query.

          If the query above was rewritten to have group by on CITIES.country, the query would fail because that column is not in the SELECT list.
          SELECT country,count(country) FROM COUNTRIES JOIN CITIES USING (COUNTRY) group by CITIES.country

          Currently, in Derby (based on my work on DERBY-4631), for a LEFT OUTER JOIN and INNER JOIN with USING/NATURAL, the join column in the SELECT list gets bound to the left table. For RIGHT OUTER JOIN with USING/NATURAL, the join column in the SELECT list gets bound to the right table. Based on what we find in the SQL spec, this binding logic for the join columns may or may not be correct.

          Show
          Mamta A. Satoor added a comment - But we haven't qualified the country column with the left table or right table. Instead it is a join column which as per SQL spec should be replaced by COALESCE, I haven't looked at the spec in detail on this, but if country in SELECT is to be replaced with COALESCE(COUNTRIES.county, CITIES.country), then should we be allowing join column country in SELECT to be bound to the COUNTRIES table? When we decide to work on this issue, we probably should look into SQL spec more about the expected behavior for such a query. If the query above was rewritten to have group by on CITIES.country, the query would fail because that column is not in the SELECT list. SELECT country,count(country) FROM COUNTRIES JOIN CITIES USING (COUNTRY) group by CITIES.country Currently, in Derby (based on my work on DERBY-4631 ), for a LEFT OUTER JOIN and INNER JOIN with USING/NATURAL, the join column in the SELECT list gets bound to the left table. For RIGHT OUTER JOIN with USING/NATURAL, the join column in the SELECT list gets bound to the right table. Based on what we find in the SQL spec, this binding logic for the join columns may or may not be correct.
          Hide
          Bryan Pendleton added a comment -

          > A query like following does not raise an error even though countries.country is not part of the SELECT column list.
          > SELECT country,count(country) FROM COUNTRIES JOIN CITIES USING (COUNTRY) group by countries.country

          I'm obviously missing your point, because the "country" column sure looks like it is in the select column list.

          Is the issue here whether "country" means "countries.country" or "cities.country"?

          Show
          Bryan Pendleton added a comment - > A query like following does not raise an error even though countries.country is not part of the SELECT column list. > SELECT country,count(country) FROM COUNTRIES JOIN CITIES USING (COUNTRY) group by countries.country I'm obviously missing your point, because the "country" column sure looks like it is in the select column list. Is the issue here whether "country" means "countries.country" or "cities.country"?
          Hide
          Mamta A. Satoor added a comment -

          There are already some tests showing the incorrect behavior in GroupByTest.java

          Show
          Mamta A. Satoor added a comment - There are already some tests showing the incorrect behavior in GroupByTest.java
          Mamta A. Satoor made changes -
          Field Original Value New Value
          Link This issue relates to DERBY-4631 [ DERBY-4631 ]
          Mamta A. Satoor created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Mamta A. Satoor
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:

                Development