Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25603 Generalize Nested Column Pruning
  3. SPARK-27123

Improve CollapseProject to handle projects cross limit/repartition/sample

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      `CollapseProject` optimizer simplifies the plan by merging the adjacent projects and performing alias substitution.

      scala> sql("SELECT b c FROM (SELECT a b FROM t)").explain
      == Physical Plan ==
      *(1) Project [a#5 AS c#1]
      +- Scan hive default.t [a#5], HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#5]
      

      We can do that more complex cases like the following.

      BEFORE

      scala> sql("SELECT b c FROM (SELECT /*+ REPARTITION(1) */ a b FROM t)").explain
      == Physical Plan ==
      *(2) Project [b#0 AS c#1]
      +- Exchange RoundRobinPartitioning(1)
         +- *(1) Project [a#5 AS b#0]
            +- Scan hive default.t [a#5], HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#5]
      

      AFTER

      scala> sql("SELECT b c FROM (SELECT /*+ REPARTITION(1) */ a b FROM t)").explain
      == Physical Plan ==
      Exchange RoundRobinPartitioning(1)
      +- *(1) Project [a#11 AS c#7]
         +- Scan hive default.t [a#11], HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#11]
      

      Attachments

        Activity

          People

            dongjoon Dongjoon Hyun
            dongjoon Dongjoon Hyun
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: