Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.1.0
Description
I was testing Spark 3.1.0 and I noticed that if you take the log(NaN) it now returns a null whereas in Spark 3.0 it returned a NaN. I'm not an expert in this but I thought NaN was correct.
Spark 3.1.0 Example:
>>> df.selectExpr(["value", "log1p(value)"]).show()
-----------------------------+
value | LOG1P(value) |
-----------------------------+
-3.4028235E38 | null |
3.4028235E38 | 88.72283906194683 |
0.0 | 0.0 |
-0.0 | -0.0 |
1.0 | 0.6931471805599453 |
-1.0 | null |
NaN | null |
-----------------------------+
Spark 3.0.0 example:
-----------------------------+
value | LOG1P(value) |
-----------------------------+
-3.4028235E38 | null |
3.4028235E38 | 88.72283906194683 |
0.0 | 0.0 |
-0.0 | -0.0 |
1.0 | 0.6931471805599453 |
-1.0 | null |
NaN | NaN |
-----------------------------+
Note it also does the same for log1p, log2, log10