Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27290

remove unneed sort under Aggregate

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      I saw some tickets to remove unneeded sort in plan while I think there's another case in which sort is redundant:

      Sort just under an non-orderPreserving node is redundant, for example:

      select count(*) from (select a1 from A order by a2);
      +- Aggregate
        +- Sort
           +- FileScan parquet
      

      But one of the existing test cases is conflict with this example:

      test("sort should not be removed when there is a node which doesn't guarantee any order") {
         val orderedPlan = testRelation.select('a, 'b).orderBy('a.asc)   
         val groupedAndResorted = orderedPlan.groupBy('a)(sum('a)).orderBy('a.asc)
         val optimized = Optimize.execute(groupedAndResorted.analyze)
         val correctAnswer = groupedAndResorted.analyze
         comparePlans(optimized, correctAnswer) 
      }
      

      Why is it designed like this? In my opinion, since Aggregate won't pass up the ordering, the below Sort is useless.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                xiaojuwu Xiaoju Wu
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: