Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27290

remove unneed sort under Aggregate

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.1.0
    • None
    • SQL
    • None

    Description

      I saw some tickets to remove unneeded sort in plan while I think there's another case in which sort is redundant:

      Sort just under an non-orderPreserving node is redundant, for example:

      select count(*) from (select a1 from A order by a2);
      +- Aggregate
        +- Sort
           +- FileScan parquet
      

      But one of the existing test cases is conflict with this example:

      test("sort should not be removed when there is a node which doesn't guarantee any order") {
         val orderedPlan = testRelation.select('a, 'b).orderBy('a.asc)   
         val groupedAndResorted = orderedPlan.groupBy('a)(sum('a)).orderBy('a.asc)
         val optimized = Optimize.execute(groupedAndResorted.analyze)
         val correctAnswer = groupedAndResorted.analyze
         comparePlans(optimized, correctAnswer) 
      }
      

      Why is it designed like this? In my opinion, since Aggregate won't pass up the ordering, the below Sort is useless.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              xiaojuwu Xiaoju Wu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: