Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22266

The same aggregate function was evaluated multiple times

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None

      Description

      We should avoid the same aggregate function being evaluated more than once, and this is what has been stated in the code comment below (patterns.scala:206). However things didn't work as expected.

            // A single aggregate expression might appear multiple times in resultExpressions.
            // In order to avoid evaluating an individual aggregate function multiple times, we'll
            // build a set of the distinct aggregate expressions and build a function which can
            // be used to re-write expressions so that they reference the single copy of the
            // aggregate function which actually gets computed.
      

      For example, the physical plan of

      SELECT a, max(b+1), max(b+1) + 1 FROM testData2 GROUP BY a
      

      was

      HashAggregate(keys=[a#23], functions=[max((b#24 + 1)), max((b#24 + 1))], output=[a#23, max((b + 1))#223, (max((b + 1)) + 1)#224])
      +- HashAggregate(keys=[a#23], functions=[partial_max((b#24 + 1)), partial_max((b#24 + 1))], output=[a#23, max#231, max#232])
         +- SerializeFromObject [assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true]).a AS a#23, assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true]).b AS b#24]
            +- Scan ExternalRDDScan[obj#22]
      

      , where in each HashAggregate there were two identical aggregate functions "max(b#24 + 1)".

        Attachments

          Activity

            People

            • Assignee:
              maryannxue Maryann Xue
              Reporter:
              maryannxue Maryann Xue
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: