Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3652

Need to document order of operations with window functions and flatten

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.2.0
    • Component/s: Documentation
    • Labels:

      Description

      In standard SQL, window functions are the last set of operations performed in a query except for the final order by clause.
      Using window function with flatten is a bit confusing, because it appears as an operator in the query plan and I expected flatten to run first followed by a window function.

      This is not what is happening:

      0: jdbc:drill:schema=dfs> select * from `complex.json`;
      +----+-----------+----------+
      | x  |     y     |    z     |
      +----+-----------+----------+
      | 5  | a string  | [1,2,3]  |
      +----+-----------+----------+
      1 row selected (0.128 seconds)
      
      0: jdbc:drill:schema=dfs> select sum(x) over(), x , y, flatten(z) from `complex.json`;
      +---------+----+-----------+---------+
      | EXPR$0  | x  |     y     | EXPR$3  |
      +---------+----+-----------+---------+
      | 5       | 5  | a string  | 1       |
      | 5       | 5  | a string  | 2       |
      | 5       | 5  | a string  | 3       |
      +---------+----+-----------+---------+
      3 rows selected (0.152 seconds)
      
      0: jdbc:drill:schema=dfs> explain plan for select sum(x) over(), x , y, flatten(z) from `complex.json`;
      +------+------+
      | text | json |
      +------+------+
      | 00-00    Screen
      00-01      ProjectAllowDup(EXPR$0=[$0], x=[$1], y=[$2], EXPR$3=[$3])
      00-02        Project(w0$o0=[$3], x=[$0], y=[$1], EXPR$3=[$4])
      00-03          Flatten(flattenField=[$4])
      00-04            Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3], EXPR$5=[$2])
      00-05              Project(x=[$1], y=[$2], z=[$3], w0$o0=[$4])
      00-06                Window(window#0=[window(partition {} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])])
      00-07                  Project(T38¦¦*=[$0], x=[$1], y=[$2], z=[$3])
      00-08                    Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/subqueries/complex.json, numFiles=1, columns=[`*`], files=[maprfs:///drill/testdata/subqueries/complex.json]]]
      

      We should suggest to users to put flatten in a subquery if they want to run window function on top of the result set returned by flatten.

      0: jdbc:drill:schema=dfs> select x, y, a, sum(x) over() from  ( select x , y, flatten(z) as a from `complex.json`);
      +----+-----------+----+---------+
      | x  |     y     | a  | EXPR$3  |
      +----+-----------+----+---------+
      | 5  | a string  | 1  | 15      |
      | 5  | a string  | 2  | 15      |
      | 5  | a string  | 3  | 15      |
      +----+-----------+----+---------+
      3 rows selected (0.145 seconds)
      

      I suggest we document this issue in the window function section, perhaps in "Usage notes".

        Attachments

          Activity

            People

            • Assignee:
              bbevens Bridget Bevens
              Reporter:
              vicky Victoria Markman
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: