[DRILL-3652] Need to document order of operations with window functions and flatten - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.2.0
Fix Version/s: 1.2.0
Component/s: Documentation
Labels:
- window_function

Description

In standard SQL, window functions are the last set of operations performed in a query except for the final order by clause.
Using window function with flatten is a bit confusing, because it appears as an operator in the query plan and I expected flatten to run first followed by a window function.

This is not what is happening:

0: jdbc:drill:schema=dfs> select * from `complex.json`;
+----+-----------+----------+
| x  |     y     |    z     |
+----+-----------+----------+
| 5  | a string  | [1,2,3]  |
+----+-----------+----------+
1 row selected (0.128 seconds)

0: jdbc:drill:schema=dfs> select sum(x) over(), x , y, flatten(z) from `complex.json`;
+---------+----+-----------+---------+
| EXPR$0  | x  |     y     | EXPR$3  |
+---------+----+-----------+---------+
| 5       | 5  | a string  | 1       |
| 5       | 5  | a string  | 2       |
| 5       | 5  | a string  | 3       |
+---------+----+-----------+---------+
3 rows selected (0.152 seconds)

0: jdbc:drill:schema=dfs> explain plan for select sum(x) over(), x , y, flatten(z) from `complex.json`;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      ProjectAllowDup(EXPR$0=[$0], x=[$1], y=[$2], EXPR$3=[$3])
00-02        Project(w0$o0=[$3], x=[$0], y=[$1], EXPR$3=[$4])
00-03          Flatten(flattenField=[$4])
00-04            Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3], EXPR$5=[$2])
00-05              Project(x=[$1], y=[$2], z=[$3], w0$o0=[$4])
00-06                Window(window#0=[window(partition {} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])])
00-07                  Project(T38¦¦*=[$0], x=[$1], y=[$2], z=[$3])
00-08                    Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/subqueries/complex.json, numFiles=1, columns=[`*`], files=[maprfs:///drill/testdata/subqueries/complex.json]]]

We should suggest to users to put flatten in a subquery if they want to run window function on top of the result set returned by flatten.

0: jdbc:drill:schema=dfs> select x, y, a, sum(x) over() from  ( select x , y, flatten(z) as a from `complex.json`);
+----+-----------+----+---------+
| x  |     y     | a  | EXPR$3  |
+----+-----------+----+---------+
| 5  | a string  | 1  | 15      |
| 5  | a string  | 2  | 15      |
| 5  | a string  | 3  | 15      |
+----+-----------+----+---------+
3 rows selected (0.145 seconds)

I suggest we document this issue in the window function section, perhaps in "Usage notes".

Attachments

Activity

People

Assignee:: Bridget Bevens

Reporter:: Victoria Markman

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Aug/15 22:38

Updated:: 07/Oct/15 21:48

Resolved:: 07/Oct/15 21:47