Details
Description
Since Spark 3.1.1, NULL is returned when casting a string with many decimal places to a decimal type. If the sum of the digits before and after the decimal point is less than 39, a value is returned. From 39 digits, however, NULL is returned.
This worked until Spark 3.0.X.
Code to reproduce:
- A string with 2 decimal places in front of the decimal point and 37 decimal places after the decimal point returns null
data = ['28.9259999999999983799625624669715762138'] dfs = spark.createDataFrame(data, StringType()) dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)')) dfd.show(truncate=False)
-----
value |
-----
null |
-----
- A string with 2 decimal places in front of the decimal point and 36 decimal places after the decimal point returns the number as decimal
data = ['28.925999999999998379962562466971576213'] dfs = spark.createDataFrame(data, StringType()) dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)')) dfd.show(truncate=False)
--------
value |
--------
28.92600 |
--------
- A string with 1 decimal place in front of the decimal point and 37 decimal places after the decimal point returns the number as decimal
data = ['2.9259999999999983799625624669715762138'] dfs = spark.createDataFrame(data, StringType()) dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)')) dfd.show(truncate=False)
-------
value |
-------
2.92600 |
-------