Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33399

Normalize output partitioning and sortorder with respect to aliases to avoid unneeded exchange/sort nodes

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.7, 3.0.0, 3.0.1
    • 3.1.0
    • SQL
    • None

    Description

      Spark introduces unneeded exchanges if there is a Project after Inner join. Example:

       

      spark.range(10).repartition($"id").createTempView("t1")
      spark.range(20).repartition($"id").createTempView("t2")
      spark.range(30).repartition($"id").createTempView("t3")
      
      val planned = sql(
         """
           |SELECT t2id, t3.id as t3id
           |FROM (
           |    SELECT t1.id as t1id, t2.id as t2id
           |    FROM t1, t2
           |    WHERE t1.id = t2.id
           |) t12, t3
           |WHERE t1id = t3.id
         """.stripMargin).queryExecution.executedPlan
      
      
      *(9) Project [t2id#1034L, id#1004L AS t3id#1035L]
      +- *(9) SortMergeJoin [t1id#1033L], [id#1004L], Inner
         :- *(6) Sort [t1id#1033L ASC NULLS FIRST], false, 0
         :  +- Exchange hashpartitioning(t1id#1033L, 5), true, [id=#1343] <---------------
         :     +- *(5) Project [id#996L AS t1id#1033L, id#1000L AS t2id#1034L]
         :        +- *(5) SortMergeJoin [id#996L], [id#1000L], Inner
         :           :- *(2) Sort [id#996L ASC NULLS FIRST], false, 0
         :           :  +- Exchange hashpartitioning(id#996L, 5), true, [id=#1329]
         :           :     +- *(1) Range (0, 10, step=1, splits=2)
         :           +- *(4) Sort [id#1000L ASC NULLS FIRST], false, 0
         :              +- Exchange hashpartitioning(id#1000L, 5), true, [id=#1335]
         :                 +- *(3) Range (0, 20, step=1, splits=2)
         +- *(8) Sort [id#1004L ASC NULLS FIRST], false, 0
            +- Exchange hashpartitioning(id#1004L, 5), true, [id=#1349]
               +- *(7) Range (0, 30, step=1, splits=2)

      The marked exchange in the above plan can be removed.

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            prakharjain09 Prakhar Jain
            prakharjain09 Prakhar Jain
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment