Description
Fix decimal overflow issues for decimal average in ANSI mode. Linked to SPARK-32018 and SPARK-28067, which address decimal sum.
Repro:
import org.apache.spark.sql.functions._ spark.conf.set("spark.sql.ansi.enabled", true) val df = Seq( (BigDecimal("10000000000000000000"), 1), (BigDecimal("10000000000000000000"), 1), (BigDecimal("10000000000000000000"), 2), (BigDecimal("10000000000000000000"), 2), (BigDecimal("10000000000000000000"), 2), (BigDecimal("10000000000000000000"), 2), (BigDecimal("10000000000000000000"), 2), (BigDecimal("10000000000000000000"), 2), (BigDecimal("10000000000000000000"), 2), (BigDecimal("10000000000000000000"), 2), (BigDecimal("10000000000000000000"), 2), (BigDecimal("10000000000000000000"), 2)).toDF("decNum", "intNum") val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, "intNum").agg(mean("decNum")) df2.show(40,false)
Should throw an exception (as sum overflows), but instead returns:
+-----------+
|avg(decNum)|
+-----------+
|null |
+-----------+