Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.3.0, 2.3.1
-
None
-
None
Description
We compute some 0-10 numbers in a pipeline using Spark SQL. Then we average them. The average in some cases comes out to null to our surprise (and disappointment).
After a bit of digging it looks like these numbers have ended up with the decimal(37,30) type. I've got a Spark Shell (2.3.0 and 2.3.1) repro with this type:
scala> (1 to 10000).map(_*0.001).toDF.createOrReplaceTempView("x") scala> spark.sql("select cast(value as decimal(37, 30)) as v from x").createOrReplaceTempView("x") scala> spark.sql("select avg(v) from x").show +------+ |avg(v)| +------+ | null| +------+
For up to 4471 numbers it is able to calculate the average. For 4472 or more numbers it's null.
Now I'll just change these numbers to double. But we got the types entirely automatically. We never asked for decimal. If this is the default type, it's important to support averaging a handful of them. (Sorry for the bitterness. I like double more. )
Curiously, sum() works. And count() too. So it's quite the surprise that avg() fails.
Attachments
Issue Links
- duplicates
-
SPARK-24957 Decimal arithmetic can lead to wrong values using codegen
- Resolved