Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.1.0
Description
import org.apache.spark.sql.functions._ spark.range(1,1000).distinct.withColumn("random", rand()).filter(col("random") > 0.3).orderBy("random").show
gives wrong result.
In the optimized logical plan, it shows that the filter with the non-deterministic predicate is pushed beneath the aggregate operator, which should not happen.
cc lian cheng