[SPARK-20246] Should check determinism when pushing predicates down through aggregation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.1.0
Fix Version/s: 2.0.3, 2.1.2, 2.2.0
Component/s: SQL
Labels:
- correctness

Description

import org.apache.spark.sql.functions._
spark.range(1,1000).distinct.withColumn("random", rand()).filter(col("random") > 0.3).orderBy("random").show

gives wrong result.

In the optimized logical plan, it shows that the filter with the non-deterministic predicate is pushed beneath the aggregate operator, which should not happen.

cc lian cheng

Attachments

Issue Links

links to

[Github] Pull Request #17559 (viirya)

[Github] Pull Request #17562 (cloud-fan)

Activity

People

Assignee:: Wenchen Fan

Reporter:: Weiluo Ren

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Apr/17 23:06

Updated:: 08/Apr/17 03:57

Resolved:: 08/Apr/17 03:57