Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Plan for the query with unnest is chosen non-optimally:
select customer.c_custkey, customer.c_name, t.o.o_orderkey,t.o.o_totalprice from dfs.`lateraljoin/multipleFiles` customer, unnest(customer.c_orders) t(o) order by customer.c_custkey, t.o.o_orderkey, t.o.o_totalprice limit 50
Plan:
00-00 Screen 00-01 ProjectAllowDup(c_custkey=[$0], c_name=[$1], EXPR$2=[$2], EXPR$3=[$3]) 00-02 SelectionVectorRemover 00-03 Limit(fetch=[50]) 00-04 SelectionVectorRemover 00-05 Sort(sort0=[$0], sort1=[$2], sort2=[$3], dir0=[ASC], dir1=[ASC], dir2=[ASC]) 00-06 Project(c_custkey=[$2], c_name=[$3], EXPR$2=[ITEM($4, 'o_orderkey')], EXPR$3=[ITEM($4, 'o_totalprice')]) 00-07 LateralJoin(correlation=[$cor0], joinType=[inner], requiredColumns=[{1}]) 00-09 Project(T0¦¦**=[$0], c_orders=[$1], c_custkey=[$2], c_name=[$3]) 00-11 Scan(groupscan=[EasyGroupScan [selectionRoot=file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles, numFiles=2, columns=[`**`], files=[file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_2.json, file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_1.json]]]) 00-08 Project(c_orders0=[$0]) 00-10 Unnest [srcOp=00-07]
A similar query, but with flatten:
select f.c_custkey, f.c_name, f.o.o_orderkey, f.o.o_totalprice from (select c_custkey, c_name, flatten(c_orders) as o from dfs.`lateraljoin/multipleFiles` customer) f order by f.c_custkey, f.o.o_orderkey, f.o.o_totalprice limit 50
has plan:
00-00 Screen 00-01 Project(c_custkey=[$0], c_name=[$1], EXPR$2=[$2], EXPR$3=[$3]) 00-02 SelectionVectorRemover 00-03 Limit(fetch=[50]) 00-04 SelectionVectorRemover 00-05 TopN(limit=[50]) 00-06 Project(c_custkey=[$0], c_name=[$1], EXPR$2=[ITEM($2, 'o_orderkey')], EXPR$3=[ITEM($2, 'o_totalprice')]) 00-07 Flatten(flattenField=[$2]) 00-08 Project(c_custkey=[$0], c_name=[$1], o=[$2]) 00-09 Scan(groupscan=[EasyGroupScan [selectionRoot=file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles, numFiles=2, columns=[`c_custkey`, `c_name`, `c_orders`], files=[file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_2.json, file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_1.json]]])
The main difference is that for the case of unnest, a project wasn't pushed to the scan and Limit with Sort weren't converted to TopN.
The first problem is tracked by DRILL-6545 and this Jira aims to fix the problem with TopN
Attachments
Issue Links
- is depended upon by
-
DRILL-5999 Add support for LATERAL join
- In Progress
- links to