Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33400

Normalize sameOrderExpressions in SortOrder

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.7, 3.0.0, 3.0.1
    • 3.1.0
    • SQL
    • None

    Description

      When a SortMergeJoin is followed by a Project with aliases, the outputOrdering is not propogated properly and in some cases, it leads to unrequired Sort operation:

       

       

      spark.range(10).repartition($"id").createTempView("t1")
      spark.range(20).repartition($"id").createTempView("t2")
      spark.range(30).repartition($"id").createTempView("t3")
      
      val planned = sql(
         """
           |SELECT t2id, t3.id as t3id
           |FROM (
           |    SELECT t1.id as t1id, t2.id as t2id
           |    FROM t1, t2
           |    WHERE t1.id = t2.id
           |) t12, t3
           |WHERE t1id = t3.id
         """.stripMargin).queryExecution.executedPlan
      
      
      
      *(8) Project [t2id#1059L, id#1004L AS t3id#1060L]
      +- *(8) SortMergeJoin [t2id#1059L], [id#1004L], Inner
         :- *(5) Sort [t2id#1059L ASC NULLS FIRST ], false, 0  <-----------------------
         :  +- *(5) Project [id#1000L AS t2id#1059L]
         :     +- *(5) SortMergeJoin [id#996L], [id#1000L], Inner
         :        :- *(2) Sort [id#996L ASC NULLS FIRST ], false, 0
         :        :  +- Exchange hashpartitioning(id#996L, 5), true, [id=#1426]
         :        :     +- *(1) Range (0, 10, step=1, splits=2)
         :        +- *(4) Sort [id#1000L ASC NULLS FIRST ], false, 0
         :           +- Exchange hashpartitioning(id#1000L, 5), true, [id=#1432]
         :              +- *(3) Range (0, 20, step=1, splits=2)
         +- *(7) Sort [id#1004L ASC NULLS FIRST ], false, 0
            +- Exchange hashpartitioning(id#1004L, 5), true, [id=#1443]
               +- *(6) Range (0, 30, step=1, splits=2)
      
      

      The above marked Sort node could have been avoided.

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            prakharjain09 Prakhar Jain
            prakharjain09 Prakhar Jain
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment