spark.range(1,1000).distinct.withColumn("random", rand()).filter(col("random") > 0.3).orderBy("random").show
gives wrong result.
In the optimized logical plan, it shows that the filter with the non-deterministic predicate is pushed beneath the aggregate operator, which should not happen.
cc Cheng Lian
[Github] Pull Request #17559 (viirya)
[Github] Pull Request #17562 (cloud-fan)