Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33399

Normalize output partitioning and sortorder with respect to aliases to avoid unneeded exchange/sort nodes

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.7, 3.0.0, 3.0.1
    • Fix Version/s: 3.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Spark introduces unneeded exchanges if there is a Project after Inner join. Example:

       

      spark.range(10).repartition($"id").createTempView("t1")
      spark.range(20).repartition($"id").createTempView("t2")
      spark.range(30).repartition($"id").createTempView("t3")
      
      val planned = sql(
         """
           |SELECT t2id, t3.id as t3id
           |FROM (
           |    SELECT t1.id as t1id, t2.id as t2id
           |    FROM t1, t2
           |    WHERE t1.id = t2.id
           |) t12, t3
           |WHERE t1id = t3.id
         """.stripMargin).queryExecution.executedPlan
      
      
      *(9) Project [t2id#1034L, id#1004L AS t3id#1035L]
      +- *(9) SortMergeJoin [t1id#1033L], [id#1004L], Inner
         :- *(6) Sort [t1id#1033L ASC NULLS FIRST], false, 0
         :  +- Exchange hashpartitioning(t1id#1033L, 5), true, [id=#1343] <---------------
         :     +- *(5) Project [id#996L AS t1id#1033L, id#1000L AS t2id#1034L]
         :        +- *(5) SortMergeJoin [id#996L], [id#1000L], Inner
         :           :- *(2) Sort [id#996L ASC NULLS FIRST], false, 0
         :           :  +- Exchange hashpartitioning(id#996L, 5), true, [id=#1329]
         :           :     +- *(1) Range (0, 10, step=1, splits=2)
         :           +- *(4) Sort [id#1000L ASC NULLS FIRST], false, 0
         :              +- Exchange hashpartitioning(id#1000L, 5), true, [id=#1335]
         :                 +- *(3) Range (0, 20, step=1, splits=2)
         +- *(8) Sort [id#1004L ASC NULLS FIRST], false, 0
            +- Exchange hashpartitioning(id#1004L, 5), true, [id=#1349]
               +- *(7) Range (0, 30, step=1, splits=2)

      The marked exchange in the above plan can be removed.

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                prakharjain09 Prakhar Jain
                Reporter:
                prakharjain09 Prakhar Jain
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: