Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 4.3.0
-
None
-
ghx-label-1
Description
Several benchmark run measuring Impala scan performance indicates some costing improvement opportunity around ScanNode and NonGroupingAggregator.
profile_1f4d7a679a3e12d5_4223115700000000.txt shows an example of simple count query.
Key takeaway:
- There is a strong correlation between total materialized bytes (row-size * cardinality) with total materialized tuple time per fragment. Row materialization cost should be adjusted to be based on this row-sized instead of equal cost per scan range.
- NonGroupingAggregator should have much lower cost that GroupingAggregator. In example above, the cost of NonGroupingAggregator dominates the scan fragment even though it only does simple counting instead of hash table operation.
Attachments
Attachments
Issue Links
- is related to
-
IMPALA-11972 Factor in row width during ProcessingCost calculation.
- Resolved