Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20246

Should check determinism when pushing predicates down through aggregation

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.0.3, 2.1.2, 2.2.0
    • SQL

    Description

      import org.apache.spark.sql.functions._
      spark.range(1,1000).distinct.withColumn("random", rand()).filter(col("random") > 0.3).orderBy("random").show

      gives wrong result.

      In the optimized logical plan, it shows that the filter with the non-deterministic predicate is pushed beneath the aggregate operator, which should not happen.

      cc Cheng Lian

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cloud_fan Wenchen Fan
            weiluo_ren123 Weiluo Ren
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment