Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4366 Aggregation Improvement
  3. SPARK-4367

Partial aggregation support the DISTINCT aggregation

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • SQL
    • None

    Description

      Most of aggregate function(e.g average) with "distinct" value will requires all of the records in the same group to be shuffled into a single node, however, as part of the optimization, those records can be partially aggregated before shuffling, that probably reduces the overhead of shuffling significantly.

      Attachments

        Issue Links

          Activity

            People

              yhuai Yin Huai
              chenghao Cheng Hao
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: