Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22806

Window Aggregate functions: unexpected result at ordered partition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.3.0
    • None
    • SQL
    • None

    Description

      I got different results for aggregate functions (even for sum and count) when the partition is ordered "Window.partitionBy(column).orderBy(column))" and when it is not ordered 'Window.partitionBy(column)".

      Example:

      test("count, sum, stddev_pop functions over window") {
          val df = Seq(
            ("a", 1, 100.0),
            ("b", 1, 200.0)).toDF("key", "partition", "value")
          df.createOrReplaceTempView("window_table")
          checkAnswer(
            df.select(
              $"key",
              count("value").over(Window.partitionBy("partition")),
              sum("value").over(Window.partitionBy("partition")),
              stddev_pop("value").over(Window.partitionBy("partition"))
            ),
            Seq(
              Row("a", 2, 300.0, 50.0),
              Row("b", 2, 300.0, 50.0)))
        }
      
        test("count, sum, stddev_pop functions over ordered by window") {
          val df = Seq(
            ("a", 1, 100.0),
            ("b", 1, 200.0)).toDF("key", "partition", "value")
          df.createOrReplaceTempView("window_table")
          checkAnswer(
            df.select(
              $"key",
              count("value").over(Window.partitionBy("partition").orderBy("key")),
              sum("value").over(Window.partitionBy("partition").orderBy("key")),
              stddev_pop("value").over(Window.partitionBy("partition").orderBy("key"))
            ),
            Seq(
              Row("a", 2, 300.0, 50.0),
              Row("b", 2, 300.0, 50.0)))
        }
      

      The "count, sum, stddev_pop functions over ordered by window" fails with the error:

      == Results ==
      !== Correct Answer - 2 ==   == Spark Answer - 2 ==
      !struct<>                   struct<key:string,count(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):bigint,sum(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double,stddev_pop(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double>
      ![a,2,300.0,50.0]           [a,1,100.0,0.0]
       [b,2,300.0,50.0]           [b,2,300.0,50.0]
      

      Attachments

        1. WindowFunctionsWithGroupByError.scala
          1 kB
          Attila Zsolt Piros

        Activity

          People

            Unassigned Unassigned
            attilapiros Attila Zsolt Piros
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: