[SPARK-20273] Disallow Non-deterministic Filter push-down into Join Conditions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.2, 2.1.0
Fix Version/s: 2.2.0
Component/s: SQL
Labels:
None

Description

sql("SELECT t1.b, rand(0) as r FROM cachedData, cachedData t1 GROUP BY t1.b having r > 0.5").show()

We will get the following error:

Job aborted due to stage failure: Task 1 in stage 4.0 failed 1 times, most recent failure: Lost task 1.0 in stage 4.0 (TID 8, localhost, executor driver): java.lang.NullPointerException
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source)
	at org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
	at org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)

Filters could be pushed down to the join conditions by the optimizer rule PushPredicateThroughJoin. However, we block users to add non-deterministics conditions by the analyzer (For details, see the PR https://github.com/apache/spark/pull/7535).

We should not push down non-deterministic conditions; otherwise, we should allow users to do it by explicitly initialize the non-deterministic expressions

Attachments

Issue Links

links to

[Github] Pull Request #17585 (gatorsmile)

Activity

People

Assignee:: Xiao Li

Reporter:: Xiao Li

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10/Apr/17 05:58

Updated:: 10/Apr/17 16:18

Resolved:: 10/Apr/17 16:18