Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23368

Avoid unnecessary Exchange or Sort after projection

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.3.0
    • None
    • SQL

    Description

      After column rename projection, the ProjectExec's outputOrdering and outputPartitioning should reflect the projected columns as well. For example,

      SELECT b1
      FROM (
          SELECT a a1, b b1
          FROM testData2
          ORDER BY a
      )
      ORDER BY a1

      The inner query is ordered on a1 as well. If we had a rule to eliminate Sort on sorted result, together with this fix, the order-by in the outer query could have been optimized out.

       

      Similarly, the below query

      SELECT *
      FROM (
          SELECT t1.a a1, t2.a a2, t1.b b1, t2.b b2
          FROM testData2 t1
          LEFT JOIN testData2 t2
          ON t1.a = t2.a
      )
      JOIN testData2 t3
      ON a1 = t3.a

      is equivalent to

      SELECT *
      FROM testData2 t1
      LEFT JOIN testData2 t2
      ON t1.a = t2.a
      JOIN testData2 t3
      ON t1.a = t3.a

      , so the unnecessary sorting and hash-partitioning that have been optimized out for the second query should have be eliminated in the first query as well.

      Attachments

        Activity

          People

            Unassigned Unassigned
            maryannxue Wei Xue
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: