[SPARK-23233] asNondeterministic in Python UDF not being set when the UDF is called at least once - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0
Component/s: PySpark, SQL
Labels:
None

Description

With this diff

diff --git a/python/pyspark/sql/udf.py b/python/pyspark/sql/udf.py
index de96846c5c7..026a78bf547 100644
--- a/python/pyspark/sql/udf.py
+++ b/python/pyspark/sql/udf.py
@@ -180,6 +180,7 @@ class UserDefinedFunction(object):
         wrapper.deterministic = self.deterministic
         wrapper.asNondeterministic = functools.wraps(
             self.asNondeterministic)(lambda: self.asNondeterministic()._wrapped())
+        wrapper._unwrapped = lambda: self
         return wrapper

     def asNondeterministic(self):

>>> from pyspark.sql.functions import udf
>>> f = udf(lambda x: x)
>>> spark.range(1).select(f("id"))
DataFrame[<lambda>(id): string]
>>> f._unwrapped()._judf_placeholder.udfDeterministic()
True
>>> ndf = f.asNondeterministic()
>>> ndf.deterministic
False
>>> spark.range(1).select(ndf("id"))
DataFrame[<lambda>(id): string]
>>> ndf._unwrapped()._judf_placeholder.udfDeterministic()
True

Seems we don't actually update the deterministic once it's called due to cache in Python side.

Attachments

Issue Links

links to

[Github] Pull Request #20409 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 26/Jan/18 15:00

Updated:: 12/Dec/22 17:50

Resolved:: 27/Jan/18 19:27