Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20246

Should check determinism when pushing predicates down through aggregation

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.0.3, 2.1.2, 2.2.0
    • Component/s: SQL
    • Labels:

      Description

      import org.apache.spark.sql.functions._
      spark.range(1,1000).distinct.withColumn("random", rand()).filter(col("random") > 0.3).orderBy("random").show

      gives wrong result.

      In the optimized logical plan, it shows that the filter with the non-deterministic predicate is pushed beneath the aggregate operator, which should not happen.

      cc Cheng Lian

        Attachments

          Activity

            People

            • Assignee:
              cloud_fan Wenchen Fan
              Reporter:
              weiluo_ren123 Weiluo Ren
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: