Details
Description
Decimal values from Java beans are incorrectly scaled when used with functions like first/last/max...
This problem came because Encoders.bean always set Decimal values as DecimalType(this.MAX_PRECISION(), 18).
Usually it's not a problem if you use numeric functions like sum but for functions like first/last/max... it is a problem.
How to reproduce this error:
Using this class as an example:
public class Foo implements Serializable { private String group; private BigDecimal var; public BigDecimal getVar() { return var; } public void setVar(BigDecimal var) { this.var = var; } public String getGroup() { return group; } public void setGroup(String group) { this.group = group; } }
And a dummy code to create some objects:
Dataset<Foo> ds = spark.range(5) .map(l -> { Foo foo = new Foo(); foo.setGroup("" + l); foo.setVar(BigDecimal.valueOf(l + 0.1111)); return foo; }, Encoders.bean(Foo.class)); ds.printSchema(); ds.show(); +-----+------+ |group| var| +-----+------+ | 0|0.1111| | 1|1.1111| | 2|2.1111| | 3|3.1111| | 4|4.1111| +-----+------+
We can see that the DecimalType is precision 38 and 18 scale and all values are show correctly.
But if we use a first function, they are scaled incorrectly:
ds.groupBy(col("group")) .agg( first("var") ) .show(); +-----+-----------------+ |group|first(var, false)| +-----+-----------------+ | 3| 3.1111E-14| | 0| 1.111E-15| | 1| 1.1111E-14| | 4| 4.1111E-14| | 2| 2.1111E-14| +-----+-----------------+
This incorrect behavior cannot be reproduced if we use "numerical "functions like sum or if the column is cast a new Decimal Type.
ds.groupBy(col("group")) .agg( sum("var") ) .show(); +-----+--------------------+ |group| sum(var)| +-----+--------------------+ | 3|3.111100000000000000| | 0|0.111100000000000000| | 1|1.111100000000000000| | 4|4.111100000000000000| | 2|2.111100000000000000| +-----+--------------------+ ds.groupBy(col("group")) .agg( first(col("var").cast(new DecimalType(38, 8))) ) .show(); +-----+----------------------------------------+ |group|first(CAST(var AS DECIMAL(38,8)), false)| +-----+----------------------------------------+ | 3| 3.11110000| | 0| 0.11110000| | 1| 1.11110000| | 4| 4.11110000| | 2| 2.11110000| +-----+----------------------------------------+