Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17271

Planner adds un-necessary Sort even if child ordering is semantically same as required ordering

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.2, 2.0.0
    • 2.1.0
    • SQL
    • None

    Description

      Found a case when the planner is adding un-needed SORT operation due to bug in the way comparison for `SortOrder` is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253

      `SortOrder` needs to be compared semantically because `Expression` within two `SortOrder` can be "semantically equal" but not literally equal objects.

      eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")`

      Expression in required SortOrder:

            AttributeReference(
              name = "col1",
              dataType = LongType,
              nullable = false
            ) (exprId = exprId,
              qualifier = Some("a")
            )
      

      Expression in child SortOrder:

            AttributeReference(
              name = "col1",
              dataType = LongType,
              nullable = false
            ) (exprId = exprId)
      

      Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order.

      Attachments

        Issue Links

          Activity

            People

              tejasp Tejas Patil
              tejasp Tejas Patil
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: