Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6574

Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.13.0
    • Fix Version/s: 1.14.0
    • Component/s: None

      Description

      Currently we have early limit 0 optimization (planner.enable_limit0_optimization) which determines query data types before actual scan. Since we not always able to determine data type during planning, we need to add one more option to enable late limit 0 optimization (planner.enable_limit0_on_scan, exit query right after scan. LIMIT(0) on SCAN for UNION and complex functions will be disabled i.e. UNION and complex functions need data to produce result schema. This would not work for the following list of functions: KVGEN, MAPPIFY, FLATTEN, CONVERT_FROMJSON, CONVERT_TOJSON, CONVERT_TOSIMPLEJSON, CONVERT_TOEXTENDEDJSON.

      Query plan examples:

      For query

      SELECT * FROM (
        SELECT l.l_quantity, l.l_shipdate, o.o_custkey 
        FROM cp.`tpch/lineitem.parquet` l 
        JOIN cp.`tpch/orders.parquet` o ON l.l_orderkey = o.o_orderkey 
        LIMIT 2) 
      LIMIT 0
      

      plan after changes looks like

      00-00    Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75183.1 rows, 210559.1 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 527
      00-01      Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75183.0 rows, 210559.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 526
      00-02        SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75182.0 rows, 210556.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 525
      00-03          Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 524
      00-04            Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 523
      00-05              HashJoin(condition=[=($0, $3)], joinType=[inner]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75179.0 rows, 210547.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 522
      00-07                SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, cumulative cost = \{60176.0 rows, 180526.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 518
      00-09                  Limit(offset=[0], fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 517
      00-11                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 516
      00-06                SelectionVectorRemover : rowType = RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{15001.0 rows, 30001.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 521
      00-08                  Limit(offset=[0], fetch=[0]) : rowType = RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 520
      00-10                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 519
      
      

      and before changes:

      00-00    Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount = 1.0, cumulative cost = \{150354.1 rows, 1052637.1 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 452
      00-01      Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount = 1.0, cumulative cost = \{150354.0 rows, 1052637.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 451
      00-02        SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{150353.0 rows, 1052634.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 450
      00-03          Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 449
      00-04            Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 448
      00-05              HashJoin(condition=[=($0, $3)], joinType=[inner]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 60175.0, cumulative cost = \{150350.0 rows, 1052625.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 447
      00-07                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 445
      00-06                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446
      
      

      Also both early and late limit 0 optimizations will be enabled by default.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                KazydubB Bohdan Kazydub
                Reporter:
                KazydubB Bohdan Kazydub
                Reviewer:
                Volodymyr Vysotskyi
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: