[IMPALA-6290] Simplify ScannerContext buffer management to only use one I/O buffer at a time. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Impala 2.12.0
Component/s: Backend
Labels:
- resource-management

Target Version:

Impala 2.12.0
Epic Color:
ghx-label-9

Description

I'm doing this as part of the HDFS buffer management work but splitting it out as a subtask since it's a logically independent change.

ScannerContext currently depends on the scanners calling ReleaseCompletedResources() repeatedly to free up buffers. Currently this works ok, but if we add a hard constraint to the number of I/O buffers, then we could hit resource exhaustion if we scan too far ahead without calling ReleaseCompletedResources(). E.g. if we have 3 * 8MB I/O buffers to use and try to scan 25MB before calling ReleaseCompletedResources(), we end up in a state where all I/O buffers are sitting in the ScannerContext.

Certain ScannerContext operations also can exhaust the I/O buffers no matter how frequently ReleaseCompletedResources() is called. E.g. ReadBytes(25MB) or SkipBytes(25MB) would run into that problem with the current implementation.

I spent some time looking at the ScannerContext API and the calling patterns of the scanners and came to the conclusion that there's no requirement for us to accumulate buffers in completed_io_buffers_ - after ~~IMPALA-5307~~ we don't generally assume that the memory returned from previous calls remains valid when the read position from the stream is advanced.

Attachments

Activity

People

Assignee:: Tim Armstrong

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 07/Dec/17 00:51

Updated:: 12/Jan/18 02:07

Resolved:: 12/Jan/18 02:07