Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20246

Should check determinism when pushing predicates down through aggregation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.0.3, 2.1.2, 2.2.0
    • SQL

    Description

      import org.apache.spark.sql.functions._
      spark.range(1,1000).distinct.withColumn("random", rand()).filter(col("random") > 0.3).orderBy("random").show

      gives wrong result.

      In the optimized logical plan, it shows that the filter with the non-deterministic predicate is pushed beneath the aggregate operator, which should not happen.

      cc lian cheng

      Attachments

        Activity

          People

            cloud_fan Wenchen Fan
            weiluo_ren123 Weiluo Ren
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: