[SPARK-27761] Make UDF nondeterministic by default - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Brainstorming
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

Opening this issue as a followup from a discussion/question on this PR for an optimization involving deterministic udf: https://github.com/apache/spark/pull/24593#pullrequestreview-237361795
"We even should discuss whether all UDFs must be deterministic or non-deterministic by default."

Basically today in Spark 2.4, Scala UDFs are marked deterministic by default and it is implicit. To mark a udf as non deterministic, they need to call this method asNondeterministic().

The concern's expressed are that users are not aware of this property and its implications.

Attachments

Issue Links

is blocked by

SPARK-27969 Non-deterministic expressions in filters or projects can unnecessarily prevent all scan-time column pruning, harming performance

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Sunitha Kambhampati

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 17/May/19 18:06

Updated:: 20/Jun/19 05:00