Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
2.3.0
-
None
-
None
Description
I got different results for aggregate functions (even for sum and count) when the partition is ordered "Window.partitionBy(column).orderBy(column))" and when it is not ordered 'Window.partitionBy(column)".
Example:
test("count, sum, stddev_pop functions over window") { val df = Seq( ("a", 1, 100.0), ("b", 1, 200.0)).toDF("key", "partition", "value") df.createOrReplaceTempView("window_table") checkAnswer( df.select( $"key", count("value").over(Window.partitionBy("partition")), sum("value").over(Window.partitionBy("partition")), stddev_pop("value").over(Window.partitionBy("partition")) ), Seq( Row("a", 2, 300.0, 50.0), Row("b", 2, 300.0, 50.0))) } test("count, sum, stddev_pop functions over ordered by window") { val df = Seq( ("a", 1, 100.0), ("b", 1, 200.0)).toDF("key", "partition", "value") df.createOrReplaceTempView("window_table") checkAnswer( df.select( $"key", count("value").over(Window.partitionBy("partition").orderBy("key")), sum("value").over(Window.partitionBy("partition").orderBy("key")), stddev_pop("value").over(Window.partitionBy("partition").orderBy("key")) ), Seq( Row("a", 2, 300.0, 50.0), Row("b", 2, 300.0, 50.0))) }
The "count, sum, stddev_pop functions over ordered by window" fails with the error:
== Results == !== Correct Answer - 2 == == Spark Answer - 2 == !struct<> struct<key:string,count(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):bigint,sum(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double,stddev_pop(value) OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double> ![a,2,300.0,50.0] [a,1,100.0,0.0] [b,2,300.0,50.0] [b,2,300.0,50.0]