Details
-
Documentation
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
2.2.0
-
None
-
None
Description
Here's a simple example on how to reproduce this:
from pyspark.sql import functions as F, Row, types def Divide10(): def fn(value): return 10 / int(value) return F.udf(fn, types.IntegerType()) df = sc.parallelize([Row(x=5), Row(x=0)]).toDF() x = F.col('x') df2 = df.select(F.when((x > 0), Divide10()(x))) df2.show(200)
This raises a division by zero error, even if `F.when` is trying to filter out all cases where `x <= 0`. I believe the correct behavior should be not to evaluate the UDF when the `F.when` condition is false.
Interestingly enough, when the `F.when` condition is set to `F.lit(False)`, then the error is not raised and all rows resolve to `null`, which is the expected result.
Attachments
Issue Links
- is duplicated by
-
SPARK-25060 PySpark UDF in case statement is always run
- Resolved
- links to