[IMPALA-3238] Streaming pre-aggregation falls over with high cardinality aggs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: Impala 2.5.0
Fix Version/s: impala 2.5.1
Component/s: Backend
Labels:
None

Target Version:

Impala 2.6.0, impala 2.5.1

Description

If the cardinality of a streaming pre-aggregation is quite large (eg ~1B in a single fragment) I see the following behavior:

each of the 16 partitions ends up with about 62M entries (1B/16)
The target hashtable load factor is 0.75, so for this number of entries, 64M buckets is not enough. It wants to expand to 128M buckets.
sizeof(BuckeT) is 16, so 128M buckets is a 2GB bucket array.
BufferedBlockMgr::ConsumeMemory logs a warning and fails when trying to allocate the 2G array (~~IMPALA-1619~~)

On every row batch, the partitioned agg node tries again to expand the hashtable, resulting in one log message per row batch. The log messages also include GetStackTrace() which is quite slow. This flood of logging makes the query basically hang.

Attachments

Issue Links

is related to

IMPALA-3699 Print plan and query summary when memory limit is exceeded

Resolved

Activity

People

Assignee:: Tim Armstrong

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 24/Mar/16 16:07

Updated:: 06/Jul/16 22:43

Resolved:: 12/Apr/16 02:27