Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
I saw some tickets to remove unneeded sort in plan while I think there's another case in which sort is redundant:
Sort just under an non-orderPreserving node is redundant, for example:
select count(*) from (select a1 from A order by a2); +- Aggregate +- Sort +- FileScan parquet
But one of the existing test cases is conflict with this example:
test("sort should not be removed when there is a node which doesn't guarantee any order") { val orderedPlan = testRelation.select('a, 'b).orderBy('a.asc) val groupedAndResorted = orderedPlan.groupBy('a)(sum('a)).orderBy('a.asc) val optimized = Optimize.execute(groupedAndResorted.analyze) val correctAnswer = groupedAndResorted.analyze comparePlans(optimized, correctAnswer) }
Why is it designed like this? In my opinion, since Aggregate won't pass up the ordering, the below Sort is useless.
Attachments
Issue Links
- duplicates
-
SPARK-29343 Eliminate sorts without limit in the subquery of Join/Aggregation
- Resolved
- relates to
-
SPARK-23375 Optimizer should remove unneeded Sort
- Resolved