Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3182

Window function with DISTINCT qualifier returns seemingly incorrect result

    Details

      Description

      Both count(distinct <colname>) and count(all <colname>) return the same result. It does not look correct to me and I'm not sure what the correct behavior is going to be.

      (1) Latest postgres does not support distinct with Window functions:

      postgres=# select a2, count(distinct b2) over(partition by a2) from t2;
      ERROR: DISTINCT is not implemented for window functions
      LINE 1: select a2, count(distinct b2) over(partition by a2) from t2;
      ^
      postgres=# select a2, avg(distinct a2) over(partition by a2) from t2;
      ERROR: DISTINCT is not implemented for window functions
      LINE 1: select a2, avg(distinct a2) over(partition by a2) from t2;

      (2) Calcite does not support this either: https://github.com/apache/incubator-calcite/blob/master/doc/reference.md

      Do we support it ? If not, I think we should throw an error ...

      0: jdbc:drill:schema=dfs> select * from t2;
      +-----+--------+-------------+
      | a2  |   b2   |     c2      |
      +-----+--------+-------------+
      | 0   | zzz    | 2014-12-31  |
      | 1   | aaaaa  | 2015-01-01  |
      | 2   | bbbbb  | 2015-01-02  |
      | 2   | bbbbb  | 2015-01-02  |
      | 2   | bbbbb  | 2015-01-02  |
      | 3   | ccccc  | 2015-01-03  |
      | 4   | ddddd  | 2015-01-04  |
      | 5   | eeeee  | 2015-01-05  |
      | 6   | fffff  | 2015-01-06  |
      | 7   | ggggg  | 2015-01-07  |
      | 7   | ggggg  | 2015-01-07  |
      | 8   | hhhhh  | 2015-01-08  |
      | 9   | iiiii  | 2015-01-09  |
      +-----+--------+-------------+
      13 rows selected (0.134 seconds)
      
      0: jdbc:drill:schema=dfs> select a2, count(distinct b2) over(partition by a2) from t2;
      +-----+---------+
      | a2  | EXPR$1  |
      +-----+---------+
      | 0   | 1       |
      | 1   | 1       |
      | 2   | 3       |
      | 2   | 3       |
      | 2   | 3       |
      | 3   | 1       |
      | 4   | 1       |
      | 5   | 1       |
      | 6   | 1       |
      | 7   | 2       |
      | 7   | 2       |
      | 8   | 1       |
      | 9   | 1       |
      +-----+---------+
      13 rows selected (0.224 seconds)
      
      0: jdbc:drill:schema=dfs> select a2, count(b2) over(partition by a2) from t2;
      +-----+---------+
      | a2  | EXPR$1  |
      +-----+---------+
      | 0   | 1       |
      | 1   | 1       |
      | 2   | 3       |
      | 2   | 3       |
      | 2   | 3       |
      | 3   | 1       |
      | 4   | 1       |
      | 5   | 1       |
      | 6   | 1       |
      | 7   | 2       |
      | 7   | 2       |
      | 8   | 1       |
      | 9   | 1       |
      +-----+---------+
      13 rows selected (0.219 seconds)
      

        Attachments

        1. DRILL-3182.1.patch
          13 kB
          Sean Hsuan-Yi Chu

          Activity

            People

            • Assignee:
              seanhychu Sean Hsuan-Yi Chu
              Reporter:
              vicky Victoria Markman
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: