Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23368

Avoid unnecessary Exchange or Sort after projection

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Incomplete
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:

      Description

      After column rename projection, the ProjectExec's outputOrdering and outputPartitioning should reflect the projected columns as well. For example,

      SELECT b1
      FROM (
          SELECT a a1, b b1
          FROM testData2
          ORDER BY a
      )
      ORDER BY a1

      The inner query is ordered on a1 as well. If we had a rule to eliminate Sort on sorted result, together with this fix, the order-by in the outer query could have been optimized out.

       

      Similarly, the below query

      SELECT *
      FROM (
          SELECT t1.a a1, t2.a a2, t1.b b1, t2.b b2
          FROM testData2 t1
          LEFT JOIN testData2 t2
          ON t1.a = t2.a
      )
      JOIN testData2 t3
      ON a1 = t3.a

      is equivalent to

      SELECT *
      FROM testData2 t1
      LEFT JOIN testData2 t2
      ON t1.a = t2.a
      JOIN testData2 t3
      ON t1.a = t3.a

      , so the unnecessary sorting and hash-partitioning that have been optimized out for the second query should have be eliminated in the first query as well.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              maryannxue Wei Xue
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: