Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The following ticket has been fixed to enable map hash aggregation, but performance degradation than when it is disabled.
https://issues.apache.org/jira/browse/HIVE-23356
I found a few reasons for this. If there are a large number of keys, the following log will be output in large volume, affecting performance. And, this can also cause an OOM.
2024-08-02 05:21:53,675 [INFO] [TezChild] |exec.GroupByOperator|: Hash Tbl flush: #hash table = 171000
2024-08-02 05:21:53,713 [INFO] [TezChild] |exec.GroupByOperator|: Hash Table flushed: new size = 153900
By fixing this, we can improve performance as follows.
Before:
After:
And, currently the flush size is fixed, but performance can be improved by changing it depending on the data:
Attachments
Attachments
Issue Links
- relates to
-
HIVE-23356 Hash aggregation is always disabled while processing querys with grouping sets expressions.
- Closed
- links to