Description
The impact of this change is for a query that has a single distinct column and does not have any grouping expression like
SELECT COUNT(DISTINCT a) FROM table
The plan will be changed from
AGG-2 (count distinct) Shuffle to a single reducer Partial-AGG-2 (count distinct) AGG-1 (grouping on a) Shuffle by a Partial-AGG-1 (grouping on 1)
to the following one (1.5 uses this)
AGG-2 AGG-1 (grouping on a) Shuffle to a single reducer Partial-AGG-1(grouping on a)
The first plan is more robust. However, to better benchmark the impact of this change, we should use 1.5's plan and use the conf of spark.sql.specializeSingleDistinctAggPlanning to control the plan.
Attachments
Issue Links
- links to