Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3182

Window function with DISTINCT qualifier returns seemingly incorrect result

    XMLWordPrintableJSON

Details

    Description

      Both count(distinct <colname>) and count(all <colname>) return the same result. It does not look correct to me and I'm not sure what the correct behavior is going to be.

      (1) Latest postgres does not support distinct with Window functions:

      postgres=# select a2, count(distinct b2) over(partition by a2) from t2;
      ERROR: DISTINCT is not implemented for window functions
      LINE 1: select a2, count(distinct b2) over(partition by a2) from t2;
      ^
      postgres=# select a2, avg(distinct a2) over(partition by a2) from t2;
      ERROR: DISTINCT is not implemented for window functions
      LINE 1: select a2, avg(distinct a2) over(partition by a2) from t2;

      (2) Calcite does not support this either: https://github.com/apache/incubator-calcite/blob/master/doc/reference.md

      Do we support it ? If not, I think we should throw an error ...

      0: jdbc:drill:schema=dfs> select * from t2;
      +-----+--------+-------------+
      | a2  |   b2   |     c2      |
      +-----+--------+-------------+
      | 0   | zzz    | 2014-12-31  |
      | 1   | aaaaa  | 2015-01-01  |
      | 2   | bbbbb  | 2015-01-02  |
      | 2   | bbbbb  | 2015-01-02  |
      | 2   | bbbbb  | 2015-01-02  |
      | 3   | ccccc  | 2015-01-03  |
      | 4   | ddddd  | 2015-01-04  |
      | 5   | eeeee  | 2015-01-05  |
      | 6   | fffff  | 2015-01-06  |
      | 7   | ggggg  | 2015-01-07  |
      | 7   | ggggg  | 2015-01-07  |
      | 8   | hhhhh  | 2015-01-08  |
      | 9   | iiiii  | 2015-01-09  |
      +-----+--------+-------------+
      13 rows selected (0.134 seconds)
      
      0: jdbc:drill:schema=dfs> select a2, count(distinct b2) over(partition by a2) from t2;
      +-----+---------+
      | a2  | EXPR$1  |
      +-----+---------+
      | 0   | 1       |
      | 1   | 1       |
      | 2   | 3       |
      | 2   | 3       |
      | 2   | 3       |
      | 3   | 1       |
      | 4   | 1       |
      | 5   | 1       |
      | 6   | 1       |
      | 7   | 2       |
      | 7   | 2       |
      | 8   | 1       |
      | 9   | 1       |
      +-----+---------+
      13 rows selected (0.224 seconds)
      
      0: jdbc:drill:schema=dfs> select a2, count(b2) over(partition by a2) from t2;
      +-----+---------+
      | a2  | EXPR$1  |
      +-----+---------+
      | 0   | 1       |
      | 1   | 1       |
      | 2   | 3       |
      | 2   | 3       |
      | 2   | 3       |
      | 3   | 1       |
      | 4   | 1       |
      | 5   | 1       |
      | 6   | 1       |
      | 7   | 2       |
      | 7   | 2       |
      | 8   | 1       |
      | 9   | 1       |
      +-----+---------+
      13 rows selected (0.219 seconds)
      

      Attachments

        1. DRILL-3182.1.patch
          13 kB
          Sean Hsuan-Yi Chu

        Activity

          People

            seanhychu Sean Hsuan-Yi Chu
            vicky Victoria Markman
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: