[IMPALA-8656] Support for eagerly fetching and spooling all query result rows - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: Impala 2.12.0, Impala 3.2.0
Fix Version/s: Impala 3.4.0
Component/s: Backend
Labels:
None

Epic Color:
ghx-label-1

Description

Impala's current interaction with clients is pulled-based: it relies on clients to fetch results to trigger the generation of more result row batches until all the result rows have been produced. If a client issues a query without fetching all the results, the query fragments will continue to consume the resources until the query hits is cancelled and unregistered for whatever reasons. This is undesirable as resources are held up by misbehaving clients and other queries may wait for extended period of time in admission control due to this.

The high level idea for this JIRA is for Impala to have a mode in which result sets of queries are eagerly fetched and spooled somewhere (preferably some persistent storage). In this way, the cluster's resources are freed up once all result rows have been fetched and stored in the spooling location. Incoming client fetches can be returned from this spooled locations.

cc'ing stakiar, twm378, joemcdonnell, lv

Attachments

Issue Links

is related to

IMPALA-9210 Query timeline should include entry when all query results are spooled

Open

IMPALA-9339 Revise explanation of RowMaterializationTimer

Open

IMPALA-4268 Rework coordinator buffering to buffer more data

Resolved

IMPALA-9818 Add fetch size as option to impala shell

Resolved

IMPALA-8925 Consider replacing ClientRequestState ResultCache with result spooling

Resolved

IMPALA-9856 Enable result spooling by default

Resolved

relates to

IMPALA-10180 Add average size of fetch requests in runtime profile

Resolved

(1 is related to, 1 relates to)

Sub-Tasks

1.	Add RowBatchQueue interface with an implementation backed by a std::queue	Resolved	Sahil Takiar
2.	Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows are fetched	Resolved	Sahil Takiar
3.	Add additional tests in test_result_spooling.py and validate cancellation logic	Resolved	Sahil Takiar
4.	Implement a RowBatchQueue backed by a BufferedTupleStream	Resolved	Sahil Takiar
5.	Coordinator should release admitted memory per-backend rather than per-query	Resolved	Sahil Takiar
6.	Replace deque queue with spillable queue in BufferedPlanRootSink	Resolved	Sahil Takiar
7.	BufferedPlanRootSink should handle non-default fetch sizes	Resolved	Sahil Takiar
8.	Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState	Resolved	Michael Ho
9.	DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch	Resolved	Sahil Takiar
10.	Add additional counters to PlanRootSink	Resolved	Sahil Takiar
11.	DCHECK(!closed_) in SpillableRowBatchQueue::IsEmpty	Resolved	Sahil Takiar
12.	Add failpoint tests to result spooling code	Resolved	Sahil Takiar
13.	Profile fetch performance when result spooling is enabled	Resolved	Sahil Takiar
14.	BufferedPlanRootSink should directly write to a QueryResultSet if one is available	Resolved	Sahil Takiar
15.	Impala Doc: Add docs for PLAN_ROOT_SINK and result spooling	Closed	Alexandra Rodoni

Activity

People

Assignee:: Sahil Takiar

Reporter:: Michael Ho

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 12/Jun/19 02:22

Updated:: 03/Mar/21 00:38

Resolved:: 08/Oct/19 14:57