Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4366 Aggregation Improvement
  3. SPARK-10167

We need to explicitly use transformDown when rewrite aggregation results

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.6.0
    • SQL
    • None

    Description

      Right now, we use transformDown explicitly at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala#L105 and https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala#L130. We also need to be very clear on using transformDown at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala#L300 and https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala#L334 (right now transform means transformDown). The reason we need to use transformDown is when we rewrite final aggregate results, we should always match aggregate functions first. If we use transformUp, it is possible that we match grouping expression first if we use grouping expressions as children of aggregate functions.

      There is nothing wrong with our master. We just want to make sure we will not have bugs if we change the behavior of transform (change it from transformDown to Up.), which I think is very unlikely (but just incase).

      Attachments

        Activity

          People

            joshrosen Josh Rosen
            yhuai Yin Huai
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: