Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37292

Removes outer join if it only has DISTINCT on streamed side with alias

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None

    Description

          spark.range(200L).selectExpr("id AS a").createTempView("t1")
          spark.range(300L).selectExpr("id AS b").createTempView("t2")
          spark.sql("SELECT DISTINCT a AS newAlias FROM t1 LEFT JOIN t2 ON a = b").explain(true)
      

      Expected optimized plan:

      == Optimized Logical Plan ==
      Aggregate [newAlias#8L], [newAlias#8L]
      +- Project [id#0L AS newAlias#8L]
         +- Range (0, 200, step=1, splits=Some(2))
      

      Attachments

        Issue Links

          Activity

            People

              yumwang Yuming Wang
              yumwang Yuming Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: