[DRILL-1457] Limit operator optimization : push limit operator past exchange operator - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0
Component/s: Query Planning & Optimization
Labels:
- no_verified_test

Description

When there is LIMIT clause in a query, we would want to push down the LIMIT operator as much as possible, so that the upstream operator will stop execution once the desired number of rows are fetched.

Within one execution fragment, Drill applies a pull model. In many cases, there would be no performance impact if LIMIT operator is not pushed down, since LIMIT would inform the upstream operators to stop. However, in multiple fragments, Drill use a push model. if LIMIT is not pushed past the exchange operator, and the upstream fragment would continue the execution, until it receives a notice from downstream fragment, even if LIMIT operator has already got the required # of rows.

For instance:

explain plan for select * from dfs.`/Users/jni/work/tpch-data/tpch-sf10/lineitem` limit 1;

----------------------+

00-00 Screen
00-01 SelectionVectorRemover
00-02 Limit(fetch=[1])
00-03 UnionExchange
01-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/tpch-data/tpch-sf10/lineitem]], selectionRoot=/Users/jni/work/tpch-data/tpch-sf10/lineitem, columns=[SchemaPath [`*`]]]])

The query profile shows Scan operator fetches much more records than desired:

Minor Fragment Start End Total Time Max Records Max Batches
01-00-xx 0.507 1.059 0.552 43688 8
01-01-xx 0.570 1.054 0.484 27305 5
01-02-xx 0.617 1.038 0.421 16383 3
01-03-xx 0.668 1.056 0.388 10922 2
01-04-xx 0.740 1.055 0.315 10922 2
01-05-xx 0.813 1.057 0.244 5461 1

In the above plan, there would be two choices for performance optimization:
1) push the LIMIT operator past through EXCHANGE operator, ideally into SCAN operator.
2) Disable the parallel plan by removing EXCHANGE operator.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

0001-DRILL-1457-Push-Limit-past-through-UnionExchange.patch
24/Sep/15 23:19
10 kB
Jinfeng Ni

Issue Links

is related to

DRILL-3722 LIMIT 1 query on top of a dir with 50K files takes ~150 seconds

Open

relates to

CALCITE-831 Rules to push down limits

Open

DRILL-636 Push Limit operator past project, left outer join, union all operator and down into scan operator

Open

Activity

People

Assignee:: Jinfeng Ni

Reporter:: Jinfeng Ni

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 26/Sep/14 17:36

Updated:: 05/Oct/15 15:39

Resolved:: 26/Sep/15 06:22