[SPARK-11840] Restore the 1.5's behavior of planning a single distinct aggregation. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.6.0
Component/s: SQL
Labels:
None

Target Version/s:

1.6.0

Description

The impact of this change is for a query that has a single distinct column and does not have any grouping expression like
SELECT COUNT(DISTINCT a) FROM table
The plan will be changed from

AGG-2 (count distinct)
  Shuffle to a single reducer
    Partial-AGG-2 (count distinct)
      AGG-1 (grouping on a)
        Shuffle by a
          Partial-AGG-1 (grouping on 1)

to the following one (1.5 uses this)

AGG-2
  AGG-1 (grouping on a)
    Shuffle to a single reducer
      Partial-AGG-1(grouping on a)

The first plan is more robust. However, to better benchmark the impact of this change, we should use 1.5's plan and use the conf of spark.sql.specializeSingleDistinctAggPlanning to control the plan.

Attachments

Issue Links

links to

[Github] Pull Request #9828 (yhuai)

Activity

People

Assignee:: Yin Huai

Reporter:: Yin Huai

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Nov/15 01:55

Updated:: 19/Nov/15 19:02

Resolved:: 19/Nov/15 19:02