Details
-
Brainstorming
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
Description
Opening this issue as a followup from a discussion/question on this PR for an optimization involving deterministic udf: https://github.com/apache/spark/pull/24593#pullrequestreview-237361795
"We even should discuss whether all UDFs must be deterministic or non-deterministic by default."
Basically today in Spark 2.4, Scala UDFs are marked deterministic by default and it is implicit. To mark a udf as non deterministic, they need to call this method asNondeterministic().
The concern's expressed are that users are not aware of this property and its implications.
Attachments
Issue Links
- is blocked by
-
SPARK-27969 Non-deterministic expressions in filters or projects can unnecessarily prevent all scan-time column pruning, harming performance
- Resolved