Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3652

Need to document order of operations with window functions and flatten

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 1.2.0
    • Documentation

    Description

      In standard SQL, window functions are the last set of operations performed in a query except for the final order by clause.
      Using window function with flatten is a bit confusing, because it appears as an operator in the query plan and I expected flatten to run first followed by a window function.

      This is not what is happening:

      0: jdbc:drill:schema=dfs> select * from `complex.json`;
      +----+-----------+----------+
      | x  |     y     |    z     |
      +----+-----------+----------+
      | 5  | a string  | [1,2,3]  |
      +----+-----------+----------+
      1 row selected (0.128 seconds)
      
      0: jdbc:drill:schema=dfs> select sum(x) over(), x , y, flatten(z) from `complex.json`;
      +---------+----+-----------+---------+
      | EXPR$0  | x  |     y     | EXPR$3  |
      +---------+----+-----------+---------+
      | 5       | 5  | a string  | 1       |
      | 5       | 5  | a string  | 2       |
      | 5       | 5  | a string  | 3       |
      +---------+----+-----------+---------+
      3 rows selected (0.152 seconds)
      
      0: jdbc:drill:schema=dfs> explain plan for select sum(x) over(), x , y, flatten(z) from `complex.json`;
      +------+------+
      | text | json |
      +------+------+
      | 00-00    Screen
      00-01      ProjectAllowDup(EXPR$0=[$0], x=[$1], y=[$2], EXPR$3=[$3])
      00-02        Project(w0$o0=[$3], x=[$0], y=[$1], EXPR$3=[$4])
      00-03          Flatten(flattenField=[$4])
      00-04            Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3], EXPR$5=[$2])
      00-05              Project(x=[$1], y=[$2], z=[$3], w0$o0=[$4])
      00-06                Window(window#0=[window(partition {} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])])
      00-07                  Project(T38¦¦*=[$0], x=[$1], y=[$2], z=[$3])
      00-08                    Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/subqueries/complex.json, numFiles=1, columns=[`*`], files=[maprfs:///drill/testdata/subqueries/complex.json]]]
      

      We should suggest to users to put flatten in a subquery if they want to run window function on top of the result set returned by flatten.

      0: jdbc:drill:schema=dfs> select x, y, a, sum(x) over() from  ( select x , y, flatten(z) as a from `complex.json`);
      +----+-----------+----+---------+
      | x  |     y     | a  | EXPR$3  |
      +----+-----------+----+---------+
      | 5  | a string  | 1  | 15      |
      | 5  | a string  | 2  | 15      |
      | 5  | a string  | 3  | 15      |
      +----+-----------+----+---------+
      3 rows selected (0.145 seconds)
      

      I suggest we document this issue in the window function section, perhaps in "Usage notes".

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bbevens Bridget Bevens
            vicky Victoria Markman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment