[IMPALA-4268] Rework coordinator buffering to buffer more data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Impala 2.8.0
Fix Version/s: Impala 3.4.0
Component/s: Backend
Labels:
- query-lifecycle
- resource-management

Target Version:

Impala 3.4.0

Description

PlanRootSink executes the producer thread (the coordinator fragment execution thread) in a separate thread to the consumer (i.e. the thread handling the fetch RPC), which calls GetNext() to retrieve the rows. The implementation was simplified by handing off a single batch at a time from the producers to consumer.

This decision causes some problems:

Many context switches for the sender. Adding buffering would allow the sender to append to the buffer and continue progress without a context switch.
Query execution can't release resources until the client has fetched the final batch, because the coordinator fragment thread is still running and potentially producing backpressure all the way down the plan tree.
The consumer can't fulfil fetch requests greater than Impala's internal BATCH_SIZE, because it is only given one batch at a time.

The tricky part is managing the mismatch between the size of the row batches processed in Send() and the size of the fetch result asked for by the client without impacting performance too badly. The sender materializes output rows in a QueryResultSet that is owned by the coordinator. That is not, currently, a splittable object - instead it contains the actual RPC response struct that will hit the wire when the RPC completes. As asynchronous sender does not know the batch size, because it can in theory change on every fetch call (although most reasonable clients will not randomly change the fetch size).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

rows-produced-histogram.png
22/Aug/18 21:19
75 kB
Tim Armstrong

Issue Links

is blocked by

IMPALA-2905 Use FragmentMgr to manage coordinator fragments

Resolved

is depended upon by

IMPALA-7351 Add memory estimates for plan nodes and sinks with missing estimates

Resolved

is related to

IMPALA-7312 Non-blocking mode for Fetch() RPC

Resolved

IMPALA-558 HS2::FetchResults sets hasMoreRows in many cases where no more rows are to be returned

Resolved

relates to

IMPALA-1618 Impala server should always try to fulfill requested fetch size

Resolved

IMPALA-8656 Support for eagerly fetching and spooling all query result rows

Resolved

(1 relates to)

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Unassigned

Reporter:: Henry Robinson

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 10/Oct/16 18:36

Updated:: 26/Sep/19 15:26

Resolved:: 26/Sep/19 15:26