[IMPALA-5307] Consider always copying-out Disk I/O buffers instead of attaching to RowBatches - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Impala 2.11.0
Component/s: Backend
Labels:
- resource-management

Target Version:

Impala 2.11.0
Epic Color:
ghx-label-9

Description

~~IMPALA-4835~~ would be greatly simplified if we don't have to attach disk I/O buffers to RowBatches and handle the resultant complexity.

Disk I/O buffers currently need to be attached to RowBatches if the row batches directly reference var-len data in the buffer. The cases when this can occur are as follows:

The column being read contains strings
The string data is not dictionary encoded in Parquet (since we copy out the dictionary data in Parquet)
The string data is not compressed with a general-purpose compression algorithm (GZip, snappy, etc).

This includes the following cases: plain-encoded strings in uncompressed Parquet; any strings in uncompressed text, RCFile, Avro, or sequence file.

In those cases the copy avoidance could provide some performance benefits. However it's unclear that any of those file formats are/should be used in performance-critical use cases, because the storage density of uncompressed strings is almost always terrible.

We should evaluate the performance impact of the additional copies, but I suspect that it is not severe and does not impact any important use cases.

Attachments

Issue Links

Add Link

breaks

IMPALA-6489 ASAN use-after-poison in impala::HdfsScanner::InitTupleFromTemplate

Resolved

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Tim Armstrong

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 11/May/17 20:09

Updated:: 12/Feb/18 22:03

Resolved:: 22/Nov/17 19:12

Agile

View on Board

Consider always copying-out Disk I/O buffers instead of attaching to RowBatches

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment