Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Information Provided
-
2.4.3
-
None
-
None
Description
When doing multiplication with PySpark, it seems PySpark is losing precision.
For example, when multiplying two decimals with precision 38,10, it returns 38,6 instead of 38,10. It also truncates result to three decimals which is incorrect result.
from decimal import Decimal from pyspark.sql.types import DecimalType, StructType, StructField schema = StructType([StructField("amount", DecimalType(38,10)), StructField("fx", DecimalType(38,10))]) df = spark.createDataFrame([(Decimal(233.00), Decimal(1.1403218880))], schema=schema) df.printSchema() df = df.withColumn("amount_usd", df.amount * df.fx) df.printSchema() df.show()
Result
>>> df.printSchema() root |-- amount: decimal(38,10) (nullable = true) |-- fx: decimal(38,10) (nullable = true) |-- amount_usd: decimal(38,6) (nullable = true) >>> df = df.withColumn("amount_usd", df.amount * df.fx) >>> df.printSchema() root |-- amount: decimal(38,10) (nullable = true) |-- fx: decimal(38,10) (nullable = true) |-- amount_usd: decimal(38,6) (nullable = true) >>> df.show() +--------------+------------+----------+ | amount| fx|amount_usd| +--------------+------------+----------+ |233.0000000000|1.1403218880|265.695000| +--------------+------------+----------+
When rounding to two decimals, it returns 265.70 but the correct result should be 265.69499 and when rounded to two decimals, it should be 265.69.