Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.13.0
-
None
Description
Currently we have early limit 0 optimization (planner.enable_limit0_optimization) which determines query data types before actual scan. Since we not always able to determine data type during planning, we need to add one more option to enable late limit 0 optimization (planner.enable_limit0_on_scan, exit query right after scan. LIMIT(0) on SCAN for UNION and complex functions will be disabled i.e. UNION and complex functions need data to produce result schema. This would not work for the following list of functions: KVGEN, MAPPIFY, FLATTEN, CONVERT_FROMJSON, CONVERT_TOJSON, CONVERT_TOSIMPLEJSON, CONVERT_TOEXTENDEDJSON.
Query plan examples:
For query
SELECT * FROM ( SELECT l.l_quantity, l.l_shipdate, o.o_custkey FROM cp.`tpch/lineitem.parquet` l JOIN cp.`tpch/orders.parquet` o ON l.l_orderkey = o.o_orderkey LIMIT 2) LIMIT 0
plan after changes looks like
00-00 Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75183.1 rows, 210559.1 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 527 00-01 Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75183.0 rows, 210559.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 526 00-02 SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75182.0 rows, 210556.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 525 00-03 Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 524 00-04 Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 523 00-05 HashJoin(condition=[=($0, $3)], joinType=[inner]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75179.0 rows, 210547.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 522 00-07 SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, cumulative cost = \{60176.0 rows, 180526.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 518 00-09 Limit(offset=[0], fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 517 00-11 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 516 00-06 SelectionVectorRemover : rowType = RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{15001.0 rows, 30001.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 521 00-08 Limit(offset=[0], fetch=[0]) : rowType = RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 520 00-10 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 519
and before changes:
00-00 Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount = 1.0, cumulative cost = \{150354.1 rows, 1052637.1 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 452 00-01 Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount = 1.0, cumulative cost = \{150354.0 rows, 1052637.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 451 00-02 SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{150353.0 rows, 1052634.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 450 00-03 Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 449 00-04 Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 448 00-05 HashJoin(condition=[=($0, $3)], joinType=[inner]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 60175.0, cumulative cost = \{150350.0 rows, 1052625.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 447 00-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 445 00-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446
Also both early and late limit 0 optimizations will be enabled by default.
Attachments
Issue Links
- Blocked
-
DRILL-6606 Hash Join returns incorrect data types when joining subqueries with limit 0
- Resolved
- links to