Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3657

Wrong result with SUM(1) window function when multiple partitions are present

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      The common use case for this would be: SUM( CASE WHEN x=y THEN 1 ELSE 0 END)

      Wrong result:

      0: jdbc:drill:schema=dfs> select
      . . . . . . . . . . . . >         b2,
      . . . . . . . . . . . . >         c2,
      . . . . . . . . . . . . >         sum(1)  over(partition by b2 order by c2),
      . . . . . . . . . . . . >         sum(1)  over(partition by c2)
      . . . . . . . . . . . . > from
      . . . . . . . . . . . . >         t2
      . . . . . . . . . . . . > order by
      . . . . . . . . . . . . >         1,2;
      +--------+-------------+---------+---------+
      |   b2   |     c2      | EXPR$2  | EXPR$3  |
      +--------+-------------+---------+---------+
      | aaaaa  | 2015-01-01  | 1       | 1       |
      | bbbbb  | 2015-01-02  | 3       | 9       |
      | bbbbb  | 2015-01-02  | 3       | 9       |
      | bbbbb  | 2015-01-02  | 3       | 9       |
      | ccccc  | 2015-01-03  | 1       | 1       |
      | ddddd  | 2015-01-04  | 1       | 1       |
      | eeeee  | 2015-01-05  | 1       | 1       |
      | fffff  | 2015-01-06  | 1       | 1       |
      | ggggg  | 2015-01-07  | 2       | 4       |
      | ggggg  | 2015-01-07  | 2       | 4       |
      | hhhhh  | 2015-01-08  | 1       | 1       |
      | iiiii  | 2015-01-09  | 1       | 1       |
      | zzz    | 2014-12-31  | 1       | 1       |
      +--------+-------------+---------+---------+
      13 rows selected (0.204 seconds)
      

      Explain plan:

      | 00-00    Screen
      00-01      Project(b2=[$0], c2=[$1], EXPR$2=[$2], EXPR$3=[$3])
      00-02        SelectionVectorRemover
      00-03          Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
      00-04            Window(window#0=[window(partition {1} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($2)])])
      00-05              SelectionVectorRemover
      00-06                Sort(sort0=[$1], dir0=[ASC])
      00-07                  Window(window#0=[window(partition {0} order by [1] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [SUM($2)])])
      00-08                    SelectionVectorRemover
      00-09                      Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
      00-10                        Project(b2=[$1], c2=[$0])
      00-11                          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/t2]], selectionRoot=maprfs:/drill/testdata/subqueries/t2, numFiles=1, columns=[`b2`, `c2`]]])
      

      If you have a query with only one of these partitions, result is correct :

      0: jdbc:drill:schema=dfs> select
      . . . . . . . . . . . . >         b2,
      . . . . . . . . . . . . >         c2,
      . . . . . . . . . . . . >         sum(1)  over(partition by b2 order by c2)
      . . . . . . . . . . . . > from
      . . . . . . . . . . . . >         t2
      . . . . . . . . . . . . > order by
      . . . . . . . . . . . . >         1,2;
      +--------+-------------+---------+
      |   b2   |     c2      | EXPR$2  |
      +--------+-------------+---------+
      | aaaaa  | 2015-01-01  | 1       |
      | bbbbb  | 2015-01-02  | 3       |
      | bbbbb  | 2015-01-02  | 3       |
      | bbbbb  | 2015-01-02  | 3       |
      | ccccc  | 2015-01-03  | 1       |
      | ddddd  | 2015-01-04  | 1       |
      | eeeee  | 2015-01-05  | 1       |
      | fffff  | 2015-01-06  | 1       |
      | ggggg  | 2015-01-07  | 2       |
      | ggggg  | 2015-01-07  | 2       |
      | hhhhh  | 2015-01-08  | 1       |
      | iiiii  | 2015-01-09  | 1       |
      | zzz    | 2014-12-31  | 1       |
      +--------+-------------+---------+
      13 rows selected (0.196 seconds)
      
      0: jdbc:drill:schema=dfs> select
      . . . . . . . . . . . . >         b2,
      . . . . . . . . . . . . >         c2,
      . . . . . . . . . . . . >         sum(1)  over(partition by c2)
      . . . . . . . . . . . . > from
      . . . . . . . . . . . . >         t2
      . . . . . . . . . . . . > order by
      . . . . . . . . . . . . >         1,2;
      +--------+-------------+---------+
      |   b2   |     c2      | EXPR$2  |
      +--------+-------------+---------+
      | aaaaa  | 2015-01-01  | 1       |
      | bbbbb  | 2015-01-02  | 3       |
      | bbbbb  | 2015-01-02  | 3       |
      | bbbbb  | 2015-01-02  | 3       |
      | ccccc  | 2015-01-03  | 1       |
      | ddddd  | 2015-01-04  | 1       |
      | eeeee  | 2015-01-05  | 1       |
      | fffff  | 2015-01-06  | 1       |
      | ggggg  | 2015-01-07  | 2       |
      | ggggg  | 2015-01-07  | 2       |
      | hhhhh  | 2015-01-08  | 1       |
      | iiiii  | 2015-01-09  | 1       |
      | zzz    | 2014-12-31  | 1       |
      +--------+-------------+---------+
      13 rows selected (0.179 seconds)
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            seanhychu Sean Hsuan-Yi Chu
            vicky Victoria Markman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment