[SPARK-11135] Exchange sort-planning logic incorrectly avoid sorts when existing ordering is non-empty subset of required ordering - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.2, 1.6.0
Component/s: SQL
Labels:
None

Target Version/s:

1.5.2, 1.6.0

Description

In Spark SQL, the Exchange planner tries to avoid unnecessary sorts in cases where the data has already been sorted by a superset of the requested sorting columns. For instance, let's say that a query calls for an operator's input to be sorted by `a.asc` and the input happens to already be sorted by `[a.asc, b.asc]`. In this case, we do not need to re-sort the input. The converse, however, is not true: if the query calls for `[a.asc, b.asc]`, then `a.asc` alone will not satisfy the ordering requirements, requiring an additional sort to be planned by Exchange.

However, the current Exchange code gets this wrong and incorrectly skips sorting when the existing output ordering is a subset of the required ordering. This is simple to fix, however.

This bug was introduced in https://github.com/apache/spark/pull/7458, so it affects 1.5.0+.

Attachments

Issue Links

links to

[Github] Pull Request #9140 (JoshRosen)

Activity

People

Assignee:: Josh Rosen

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 15/Oct/15 21:26

Updated:: 16/Oct/15 00:37

Resolved:: 16/Oct/15 00:37