[IMPALA-2940] Parquet DictDecoders accumulate throughout query - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: Impala 2.3.0
Fix Version/s: Impala 2.5.0, Impala 2.3.4
Component/s: Backend
Labels:
- resource-management

Target Version:

Impala 2.5.0, Impala 2.3.4

Description

Parquet dictionary decoders can accumulate throughout query execution. One is created per-column per-split. The decoder contains an vector of values for the dictionary that is not cleared when the scanner is finished with it.

I've attached a graph of memory usage when running this query on TPC-DS scale factor 100. Before is cdh5-trunk, and after is with a fix that delete the ColumnReader objects after each input split.

use tpcds_100_parquet;                                                           
set num_scanner_threads=8;
select * from store_sales where ss_sold_time_sk = 12512434;

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

mem-usage-dict-leak.png
04/Feb/16 00:14
5 kB
Tim Armstrong

Issue Links

relates to

IMPALA-2885 Scanners store per-split objects in per-query object pool

Resolved

Activity

People

Assignee:: Tim Armstrong

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 04/Feb/16 00:19

Updated:: 23/Feb/16 04:57

Resolved:: 05/Feb/16 15:01