[DRILL-3952] Improve Window Functions performance when not all batches are required to process the current batch - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.2.0
Fix Version/s: 1.3.0
Component/s: Execution - Relational Operators
Labels:
None

Description

Currently, the window operator blocks until all batches of current partition to be available. For some queries it's necessary (e.g. aggregate with no order-by in the window definition), but for other cases the window operator can process and pass the current batch downstream sooner.

Implementing this should help the window operator use less memory and run faster, especially in the presence of a limit operator.

The purpose of this JIRA is to improve the window operator in the following cases:

aggregate, when order-by clause is available in window definition, can process current batch as soon as it receives the last peer row
lead can process current batch as soon as it receives 1 more batch
lag can process current batch immediately
first_value can process current batch immediately
last_value, when order-by clause is available in window definition, can process current batch as soon as it receives the last peer row
row_number, rank and dense_rank can process current batch immediately

Attachments

Issue Links

incorporates

DRILL-3770 Query with window function having just ORDER BY clause runs out of memory on large datasets

Closed

Activity

People

Assignee:: Abdel Hakim Deneche

Reporter:: Abdel Hakim Deneche

Reviewer:: Dechang Gu

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 19/Oct/15 17:43

Updated:: 14/Dec/15 23:28

Resolved:: 05/Nov/15 06:03