[SPARK-40565] Non-deterministic filters shouldn't get pushed to V2 file sources - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: 3.4.0
Component/s: SQL
Labels:
None

Description

Currently non-deterministic filters can be pushed down to V2 file sources, which is different from V1 which prevents out non-deterministic filters from being pushed.

Main consequences:

Things like doing a rand filter on a partition column will throw an exception:
- IllegalArgumentException: requirement failed: Nondeterministic expression org.apache.spark.sql.catalyst.expressions.Rand should be initialized before eval.
Using a non-deterministic UDF to collect metrics via accumulators gets pushed down and gives the wrong metrics

Attachments

Issue Links

links to

[Github] Pull Request #38003 (Kimahriman)

Activity

People

Assignee:: Adam Binford

Reporter:: Adam Binford

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 26/Sep/22 12:56

Updated:: 07/Oct/22 05:57

Resolved:: 07/Oct/22 05:57