Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.3.0
-
None
Description
For each input row the aggregation node uses HashTable::Find() followed by HashTable::Insert() if the grouping key isn't already present in the table. Both of these methods probe the hash table to find the same bucket. If we added a FindOrInsert() method to the hash table that returned a modifiable iterator pointing to the bucket, we could save a significant number of hash table probes.
There is already a TODO in the partitioned-aggregation-node-ir.cc code for this, so I'm creating a JIRA to track the issue.
This could speed up aggregations with large output size significantly, e.g. TPC-H query 13 (see IMPALA-2470).
Attachments
Issue Links
- is a child of
-
IMPALA-2755 Clean up memory management in backend
- Resolved