[SPARK-22347] UDF is evaluated when 'F.when' condition is false - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Documentation
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: PySpark
Labels:
None

Description

Here's a simple example on how to reproduce this:

from pyspark.sql import functions as F, Row, types

def Divide10():
    def fn(value): return 10 / int(value)
    return F.udf(fn, types.IntegerType())

df = sc.parallelize([Row(x=5), Row(x=0)]).toDF()

x = F.col('x')
df2 = df.select(F.when((x > 0), Divide10()(x)))
df2.show(200)

This raises a division by zero error, even if `F.when` is trying to filter out all cases where `x <= 0`. I believe the correct behavior should be not to evaluate the UDF when the `F.when` condition is false.

Interestingly enough, when the `F.when` condition is set to `F.lit(False)`, then the error is not raised and all rows resolve to `null`, which is the expected result.

Attachments

Issue Links

is duplicated by

SPARK-25060 PySpark UDF in case statement is always run

Resolved

links to

[Github] Pull Request #19584 (viirya)

[Github] Pull Request #19592 (viirya)

[Github] Pull Request #19617 (viirya)

Activity

People

Assignee:: L. C. Hsieh

Reporter:: Nicolas Porter

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 25/Oct/17 00:52

Updated:: 13/Aug/18 17:55

Resolved:: 13/Aug/18 16:28