Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-634

Allow ORDER BY aggregate function in SELECT DISTINCT, provided that it occurs in SELECT clause

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0-incubating
    • Component/s: None
    • Labels:
      None

      Description

      For example, the query

      select distinct count(empno) from emp
      group by empno
      order by 1
      

      gave:

      org.apache.calcite.sql.validate.SqlValidatorException <init>
      SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Expression 'COUNT(`EMP`.`EMPNO`)' is not in the select clause

      1. CALCITE-634.1.patch
        3 kB
        Sean Hsuan-Yi Chu

        Activity

        Hide
        jni Jinfeng Ni added a comment -

        I think the reason that you saw such SqlValidatorException is that Calcite does not allow ordinal reference in GROUP BY. Ordinal reference is only allowed in ORDER BY clause.

        Some systems, e.g Postgre, do allow ordinal reference in GROUP BY. The following query works in poster 9.3.

        select distinct dept_name from dept group by 1;
         dept_name
        -----------
         abc
        

        Seems it's desirable to open this restriction in Calcite, and allow ordinal reference in GROUP BY clause. Julian Hyde, what's your opinion on this ?

        Show
        jni Jinfeng Ni added a comment - I think the reason that you saw such SqlValidatorException is that Calcite does not allow ordinal reference in GROUP BY. Ordinal reference is only allowed in ORDER BY clause. Some systems, e.g Postgre, do allow ordinal reference in GROUP BY. The following query works in poster 9.3. select distinct dept_name from dept group by 1; dept_name ----------- abc Seems it's desirable to open this restriction in Calcite, and allow ordinal reference in GROUP BY clause. Julian Hyde , what's your opinion on this ?
        Hide
        seanhychu Sean Hsuan-Yi Chu added a comment -

        In the original query, ordinal reference is in order by actually.

        The exception is thrown here: AggChecker.java (line 117).

        And it seems Calcite does not allow aggregation function to be in ORDER BY clause if there is SELECT DISTINCT.

        However, as, in this query, this aggregation function is simply an entity in the select clause,
        calcite might be less restrictive?

        Show
        seanhychu Sean Hsuan-Yi Chu added a comment - In the original query, ordinal reference is in order by actually. The exception is thrown here: AggChecker.java (line 117). And it seems Calcite does not allow aggregation function to be in ORDER BY clause if there is SELECT DISTINCT. However, as, in this query, this aggregation function is simply an entity in the select clause, calcite might be less restrictive?
        Hide
        julianhyde Julian Hyde added a comment -

        Two issues here.

        1. Yes, this is a bug. The original query "select distinct count(empno) from emp group by empno order by 1" is valid SQL and Calcite should give the correct answer.

        (I can't think of a case where you would actually want to write this query "count(distinct empno)" is much more common than "distinct count(empno)".)

        2. Whether to allow ordinals in GROUP BY. I am not particularly inclined to. First of all, it is not standard SQL. Second, there are semantic problems because SELECT is evaluated after GROUP BY. Third, although it is onerous to type long expressions in both the SELECT and GROUP BY clause, there is a workaround, namely to create a sub-query in the FROM clause or use a WITH clause. I would consider accepting it as an optional feature, disabled by default, if it was supported by tests that dealt with all of the semantic issues. If you want to pursue this, please open a new jira case.

        Show
        julianhyde Julian Hyde added a comment - Two issues here. 1. Yes, this is a bug. The original query "select distinct count(empno) from emp group by empno order by 1" is valid SQL and Calcite should give the correct answer. (I can't think of a case where you would actually want to write this query "count(distinct empno)" is much more common than "distinct count(empno)".) 2. Whether to allow ordinals in GROUP BY. I am not particularly inclined to. First of all, it is not standard SQL. Second, there are semantic problems because SELECT is evaluated after GROUP BY. Third, although it is onerous to type long expressions in both the SELECT and GROUP BY clause, there is a workaround, namely to create a sub-query in the FROM clause or use a WITH clause. I would consider accepting it as an optional feature, disabled by default, if it was supported by tests that dealt with all of the semantic issues. If you want to pursue this, please open a new jira case.
        Hide
        seanhychu Sean Hsuan-Yi Chu added a comment -

        A patch to address this issue

        Show
        seanhychu Sean Hsuan-Yi Chu added a comment - A patch to address this issue
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/5e285a16 .
        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.2.0-incubating (2015-04-16)

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.2.0-incubating (2015-04-16)

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            seanhychu Sean Hsuan-Yi Chu
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development