Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9592

Last implemented based on AggregateExpression1 are calculating the values for entire DataFrame partition not on GroupedData partition.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.4.0
    • 1.5.0
    • SQL
    • None

    Description

      In current implementation, First and Last aggregates were calculating the values for entire DataFrame partition and then the same value was returned for all GroupedData in the partition.
      sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
      Fixed the First and Last aggregates should compute first and last value per GroupedData instead of entire DataFrame.

      Attachments

        Activity

          People

            yhuai Yin Huai
            ggupta81 gaurav
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 4h
                4h
                Remaining:
                Remaining Estimate - 4h
                4h
                Logged:
                Time Spent - Not Specified
                Not Specified