[SPARK-21998] SortMergeJoinExec did not calculate its outputOrdering correctly during physical planning - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.3.0
Component/s: SQL
Labels:
None

Description

Right now the calculation of SortMergeJoinExec's outputOrdering relies on the fact that its children have already been sorted on the join keys, while this is often not true until EnsureRequirements has been applied.

  /**
   * For SMJ, child's output must have been sorted on key or expressions with the same order as
   * key, so we can get ordering for key from child's output ordering.
   */
  private def getKeyOrdering(keys: Seq[Expression], childOutputOrdering: Seq[SortOrder])
    : Seq[SortOrder] = {
    keys.zip(childOutputOrdering).map { case (key, childOrder) =>
      SortOrder(key, Ascending, childOrder.sameOrderExpressions + childOrder.child - key)
    }
  }

Thus SortMergeJoinExec's outputOrdering is most likely not correct during the physical planning stage, and as a result, potential physical optimizations that rely on the required/output orderings, like ~~SPARK-18591~~, will not work for SortMergeJoinExec.
The right behavior of getKeyOrdering(keys, childOutputOrdering) should be:
1. If the childOutputOrdering satisfies (is a superset of) the required child ordering => childOutputOrdering
2. Otherwise => required child ordering

Attachments

Issue Links

links to

[Github] Pull Request #19281 (maryannxue)

Activity

People

Assignee:: Wei Xue

Reporter:: Wei Xue

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 13/Sep/17 19:17

Updated:: 22/Sep/17 06:58

Resolved:: 22/Sep/17 06:58