[IMPALA-6054] Parquet dictionary pages should be freed on dictionary construction - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Done
Affects Version/s: Impala 2.10.0
Fix Version/s: Impala 2.11.0
Component/s: Backend
Labels:
- resource-management

Target Version:

Product Backlog
Epic Color:
ghx-label-2

Description

The Parquet scanner uses the dictionary_pool_ to allocate memory for the dictionary page (see BaseScalarColumnReader::InitDictionary()). This dictionary page is used to initialize the dictionary in CreateDictionaryDecoder(). The resulting dictionary is a vector of values. For some datatypes, such as strings, the resulting dictionary has an array of StringValue's that contain pointers into the dictionary page (see the StringValue specialization in ParquetPlainEncoder::Decode()). In this case, the dictionary page must be kept and attached to the last row batch that references it. However, for other datatypes, the values are copied into the dictionary and the dictionary page is no longer needed after the dictionary is constructed.

Currently, these dictionary pages remain in the dictionary_pool_ and are attached to the last row batch to be passed to other ExecNodes (see FlushRowGroupResources()). This should only pass StringValue dictionary pages (or other types that point to data in the page) on the row batch. The other types should be freed immediately once the dictionary has been constructed.

Attachments

Issue Links

relates to

IMPALA-5304 Parquet scanner transfers decompression buffers when not needed

Resolved

Activity

People

Assignee:: Csaba Ringhofer

Reporter:: Joe McDonnell

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Oct/17 20:21

Updated:: 21/Nov/17 17:52

Resolved:: 21/Nov/17 17:52