Description
Using the GeometricMean UDAF example (https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html), I found the following discrepancy in results:
scala> sqlContext.sql("select group_id, gm(id) from simple group by group_id").show() +--------+---+ |group_id|_c1| +--------+---+ | 0|0.0| | 1|0.0| | 2|0.0| +--------+---+ scala> sqlContext.sql("select group_id, gm(id) as GeometricMean from simple group by group_id").show() +--------+-----------------+ |group_id| GeometricMean| +--------+-----------------+ | 0|8.981385496571725| | 1|7.301716979342118| | 2|7.706253151292568| +--------+-----------------+
Attachments
Attachments
Issue Links
- duplicates
-
SPARK-11885 UDAF may nondeterministically generate wrong results
- Resolved