Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25401

Reorder the required ordering to match the table's output ordering for bucket join

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.0
    • None
    • SQL

    Description

      Currently, we check if SortExec is needed between a operator and its child operator in method orderingSatisfies, and method orderingSatisfies require the order in the SortOrders are all the same.

      While, take the following case into consideration.

      • Table a is bucketed by (a1, a2), sorted by (a2, a1), and buckets number is 200.
      • Table b is bucketed by (b1, b2), sorted by (b2, b1), and buckets number is 200.
      • Table a join table b on (a1=b1, a2=b2)

      In this case, if the join is sort merge join, the query planner won't add exchange on both sides, while, sort will be added on both sides. Actually, sort is also unnecessary, since in the same bucket, like bucket 1 of table a, and bucket 1 of table b, (a1=b1, a2=b2) is equivalent to (a2=b2, a1=b1).

      Attachments

        Activity

          People

            Unassigned Unassigned
            gwang3 Wang, Gang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: