If the cardinality of a streaming pre-aggregation is quite large (eg ~1B in a single fragment) I see the following behavior:
- each of the 16 partitions ends up with about 62M entries (1B/16)
- The target hashtable load factor is 0.75, so for this number of entries, 64M buckets is not enough. It wants to expand to 128M buckets.
- sizeof(BuckeT) is 16, so 128M buckets is a 2GB bucket array.
- BufferedBlockMgr::ConsumeMemory logs a warning and fails when trying to allocate the 2G array (
On every row batch, the partitioned agg node tries again to expand the hashtable, resulting in one log message per row batch. The log messages also include GetStackTrace() which is quite slow. This flood of logging makes the query basically hang.