Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.2.0
Description
In standard SQL, window functions are the last set of operations performed in a query except for the final order by clause.
Using window function with flatten is a bit confusing, because it appears as an operator in the query plan and I expected flatten to run first followed by a window function.
This is not what is happening:
0: jdbc:drill:schema=dfs> select * from `complex.json`; +----+-----------+----------+ | x | y | z | +----+-----------+----------+ | 5 | a string | [1,2,3] | +----+-----------+----------+ 1 row selected (0.128 seconds) 0: jdbc:drill:schema=dfs> select sum(x) over(), x , y, flatten(z) from `complex.json`; +---------+----+-----------+---------+ | EXPR$0 | x | y | EXPR$3 | +---------+----+-----------+---------+ | 5 | 5 | a string | 1 | | 5 | 5 | a string | 2 | | 5 | 5 | a string | 3 | +---------+----+-----------+---------+ 3 rows selected (0.152 seconds) 0: jdbc:drill:schema=dfs> explain plan for select sum(x) over(), x , y, flatten(z) from `complex.json`; +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 ProjectAllowDup(EXPR$0=[$0], x=[$1], y=[$2], EXPR$3=[$3]) 00-02 Project(w0$o0=[$3], x=[$0], y=[$1], EXPR$3=[$4]) 00-03 Flatten(flattenField=[$4]) 00-04 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3], EXPR$5=[$2]) 00-05 Project(x=[$1], y=[$2], z=[$3], w0$o0=[$4]) 00-06 Window(window#0=[window(partition {} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])]) 00-07 Project(T38¦¦*=[$0], x=[$1], y=[$2], z=[$3]) 00-08 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/subqueries/complex.json, numFiles=1, columns=[`*`], files=[maprfs:///drill/testdata/subqueries/complex.json]]]
We should suggest to users to put flatten in a subquery if they want to run window function on top of the result set returned by flatten.
0: jdbc:drill:schema=dfs> select x, y, a, sum(x) over() from ( select x , y, flatten(z) as a from `complex.json`); +----+-----------+----+---------+ | x | y | a | EXPR$3 | +----+-----------+----+---------+ | 5 | a string | 1 | 15 | | 5 | a string | 2 | 15 | | 5 | a string | 3 | 15 | +----+-----------+----+---------+ 3 rows selected (0.145 seconds)
I suggest we document this issue in the window function section, perhaps in "Usage notes".