Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4366 Aggregation Improvement
  3. SPARK-11840

Restore the 1.5's behavior of planning a single distinct aggregation.

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.0
    • SQL
    • None

    Description

      The impact of this change is for a query that has a single distinct column and does not have any grouping expression like
      SELECT COUNT(DISTINCT a) FROM table
      The plan will be changed from

      AGG-2 (count distinct)
        Shuffle to a single reducer
          Partial-AGG-2 (count distinct)
            AGG-1 (grouping on a)
              Shuffle by a
                Partial-AGG-1 (grouping on 1)
      

      to the following one (1.5 uses this)

      AGG-2
        AGG-1 (grouping on a)
          Shuffle to a single reducer
            Partial-AGG-1(grouping on a)
      

      The first plan is more robust. However, to better benchmark the impact of this change, we should use 1.5's plan and use the conf of spark.sql.specializeSingleDistinctAggPlanning to control the plan.

      Attachments

        Activity

          People

            yhuai Yin Huai
            yhuai Yin Huai
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: