Description
Most of aggregate function(e.g average) with "distinct" value will requires all of the records in the same group to be shuffled into a single node, however, as part of the optimization, those records can be partially aggregated before shuffling, that probably reduces the overhead of shuffling significantly.
Attachments
Issue Links
- depends upon
-
SPARK-4233 Simplify the Aggregation Function implementation
- Resolved
- links to