Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.3.0, 1.4.0, 1.5.0
-
None
-
Spark 1.5 release
Description
Consider SortMergeJoin, which requires a sorted, clustered distribution of its input rows. Say that both of SMJ's children produce unsorted output but are both single partition. In this case, we will need to inject sort operators but should not need to inject exchanges. Unfortunately, it looks like the Exchange unnecessarily repartitions using a hash partitioning.
We should update Exchange so that it does not unnecessarily repartition children when only the ordering requirements are unsatisfied.
I'd like to fix this for Spark 1.5 since it makes certain types of unit tests easier to write.