[IMPALA-8818] Replace deque queue with spillable queue in BufferedPlanRootSink - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Impala 3.4.0
Component/s: Backend
Labels:
None

Epic Color:
ghx-label-4

Description

Add a SpillableRowBatchQueue to replace the DequeRowBatchQueue in BufferedPlanRootSink. The SpillableRowBatchQueue will wrap a BufferedTupleStream and take in a TBackendResourceProfile created by PlanRootSink#computeResourceProfile.

BufferedTupleStream Usage:

The wrapped BufferedTupleStream should be created in 'attach_on_read' mode so that pages are attached to the output RowBatch in BufferedTupleStream::GetNext. The BTS should start off as pinned (e.g. all pages are pinned). If a call to BufferedTupleStream::AddRow returns false (it returns false if "the unused reservation was not sufficient to add a new page to the stream large enough to fit 'row' and the stream could not increase the reservation to get enough unused reservation"), it should unpin the stream (BufferedTupleStream::UnpinStream) and then add the row (if the row still could not be added, then an error must have occurred, perhaps an IO error, in which case return the error and fail the query).

Constraining Resources:

When result spooling is disabled, a user can run a select * from [massive-fact-table] and scroll through the results without affecting the health of the Impala cluster (assuming they close they query promptly). Impala will stream the results one batch at a time to the user.

With result spooling, a naive implementation might try and buffer the enter fact table, and end up spilling all the contents to disk, which can potentially take up a large amount of space. So there needs to be restrictions on the memory and disk space used by the BufferedTupleStream in order to ensure a scan of a massive table does not consume all the memory or disk space of the Impala coordinator.

This problem can be solved by placing a max size on the amount of unpinned memory (perhaps through a new config option MAX_PINNED_RESULT_SPOOLING_MEMORY (maybe set to a few GBs by default). The max amount of pinned memory should already be constrained by the reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the number of rows returned by a query, and so it should limit the number of rows buffered by the BTS as well (although it is set to 0 by default). SCRATCH_LIMIT already limits the amount of disk space used for spilling (although it is set to -1 by default).

The PlanRootSink should attempt to accurately estimate how much memory it needs to buffer all results in memory. This requires setting an accurate value of ResourceProfile#memEstimateBytes_ in PlanRootSink#computeResourceProfile. If statistics are available, the estimate can be based on the number of estimated rows returned multiplied by the size of the rows returned. The min reservation should account for a read and write page for the BufferedTupleStream.

Attachments

Issue Links

is duplicated by

IMPALA-8784 Implement a RowBatchQueue backed by a BufferedTupleStream

Resolved

is related to

IMPALA-8902 TestResultSpooling.test_spilling is flaky

Resolved

IMPALA-8926 TestResultSpooling::_test_full_queue is flaky

Resolved

IMPALA-8907 TestResultSpooling.test_slow_query is flaky

Resolved

Activity

People

Assignee:: Sahil Takiar

Reporter:: Sahil Takiar

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 31/Jul/19 16:48

Updated:: 09/Sep/19 19:10

Resolved:: 09/Sep/19 17:58