Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3657

Wrong result with SUM(1) window function when multiple partitions are present

    XMLWordPrintableJSON

Details

    Description

      The common use case for this would be: SUM( CASE WHEN x=y THEN 1 ELSE 0 END)

      Wrong result:

      0: jdbc:drill:schema=dfs> select
      . . . . . . . . . . . . >         b2,
      . . . . . . . . . . . . >         c2,
      . . . . . . . . . . . . >         sum(1)  over(partition by b2 order by c2),
      . . . . . . . . . . . . >         sum(1)  over(partition by c2)
      . . . . . . . . . . . . > from
      . . . . . . . . . . . . >         t2
      . . . . . . . . . . . . > order by
      . . . . . . . . . . . . >         1,2;
      +--------+-------------+---------+---------+
      |   b2   |     c2      | EXPR$2  | EXPR$3  |
      +--------+-------------+---------+---------+
      | aaaaa  | 2015-01-01  | 1       | 1       |
      | bbbbb  | 2015-01-02  | 3       | 9       |
      | bbbbb  | 2015-01-02  | 3       | 9       |
      | bbbbb  | 2015-01-02  | 3       | 9       |
      | ccccc  | 2015-01-03  | 1       | 1       |
      | ddddd  | 2015-01-04  | 1       | 1       |
      | eeeee  | 2015-01-05  | 1       | 1       |
      | fffff  | 2015-01-06  | 1       | 1       |
      | ggggg  | 2015-01-07  | 2       | 4       |
      | ggggg  | 2015-01-07  | 2       | 4       |
      | hhhhh  | 2015-01-08  | 1       | 1       |
      | iiiii  | 2015-01-09  | 1       | 1       |
      | zzz    | 2014-12-31  | 1       | 1       |
      +--------+-------------+---------+---------+
      13 rows selected (0.204 seconds)
      

      Explain plan:

      | 00-00    Screen
      00-01      Project(b2=[$0], c2=[$1], EXPR$2=[$2], EXPR$3=[$3])
      00-02        SelectionVectorRemover
      00-03          Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
      00-04            Window(window#0=[window(partition {1} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($2)])])
      00-05              SelectionVectorRemover
      00-06                Sort(sort0=[$1], dir0=[ASC])
      00-07                  Window(window#0=[window(partition {0} order by [1] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [SUM($2)])])
      00-08                    SelectionVectorRemover
      00-09                      Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
      00-10                        Project(b2=[$1], c2=[$0])
      00-11                          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/t2]], selectionRoot=maprfs:/drill/testdata/subqueries/t2, numFiles=1, columns=[`b2`, `c2`]]])
      

      If you have a query with only one of these partitions, result is correct :

      0: jdbc:drill:schema=dfs> select
      . . . . . . . . . . . . >         b2,
      . . . . . . . . . . . . >         c2,
      . . . . . . . . . . . . >         sum(1)  over(partition by b2 order by c2)
      . . . . . . . . . . . . > from
      . . . . . . . . . . . . >         t2
      . . . . . . . . . . . . > order by
      . . . . . . . . . . . . >         1,2;
      +--------+-------------+---------+
      |   b2   |     c2      | EXPR$2  |
      +--------+-------------+---------+
      | aaaaa  | 2015-01-01  | 1       |
      | bbbbb  | 2015-01-02  | 3       |
      | bbbbb  | 2015-01-02  | 3       |
      | bbbbb  | 2015-01-02  | 3       |
      | ccccc  | 2015-01-03  | 1       |
      | ddddd  | 2015-01-04  | 1       |
      | eeeee  | 2015-01-05  | 1       |
      | fffff  | 2015-01-06  | 1       |
      | ggggg  | 2015-01-07  | 2       |
      | ggggg  | 2015-01-07  | 2       |
      | hhhhh  | 2015-01-08  | 1       |
      | iiiii  | 2015-01-09  | 1       |
      | zzz    | 2014-12-31  | 1       |
      +--------+-------------+---------+
      13 rows selected (0.196 seconds)
      
      0: jdbc:drill:schema=dfs> select
      . . . . . . . . . . . . >         b2,
      . . . . . . . . . . . . >         c2,
      . . . . . . . . . . . . >         sum(1)  over(partition by c2)
      . . . . . . . . . . . . > from
      . . . . . . . . . . . . >         t2
      . . . . . . . . . . . . > order by
      . . . . . . . . . . . . >         1,2;
      +--------+-------------+---------+
      |   b2   |     c2      | EXPR$2  |
      +--------+-------------+---------+
      | aaaaa  | 2015-01-01  | 1       |
      | bbbbb  | 2015-01-02  | 3       |
      | bbbbb  | 2015-01-02  | 3       |
      | bbbbb  | 2015-01-02  | 3       |
      | ccccc  | 2015-01-03  | 1       |
      | ddddd  | 2015-01-04  | 1       |
      | eeeee  | 2015-01-05  | 1       |
      | fffff  | 2015-01-06  | 1       |
      | ggggg  | 2015-01-07  | 2       |
      | ggggg  | 2015-01-07  | 2       |
      | hhhhh  | 2015-01-08  | 1       |
      | iiiii  | 2015-01-09  | 1       |
      | zzz    | 2014-12-31  | 1       |
      +--------+-------------+---------+
      13 rows selected (0.179 seconds)
      

      Attachments

        Activity

          People

            seanhychu Sean Hsuan-Yi Chu
            vicky Victoria Markman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: