[SPARK-32361] Remove project if output is subset of child - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

We can remove some redundant project after we completed pruning column.

e.g.,

create table t1(c1 int, c2 int) using parquet;

explain extended
select sum(c1) from (
  select * from t1
);

Currently we get this plan.

== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[sum(cast(c1#19 as bigint))], output=[sum(c1)#68L])
+- Exchange SinglePartition, true, [id=#86]
   +- *(1) HashAggregate(keys=[], functions=[partial_sum(cast(c1#19 as bigint))], output=[sum#70L])
      +- *(1) Project [c1#19]
         +- *(1) ColumnarToRow
            +- FileScan parquet default.t1[c1#19] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[hdfs:///user/hive/warehouse/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c1:int>

We can remove the `Project`, like this

== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[sum(cast(c1#19 as bigint))], output=[sum(c1)#68L])
+- Exchange SinglePartition, true, [id=#86]
   +- *(1) HashAggregate(keys=[], functions=[partial_sum(cast(c1#19 as bigint))], output=[sum#70L])
       +- *(1) ColumnarToRow
          +- FileScan parquet default.t1[c1#19] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[hdfs:///user/hive/warehouse/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c1:int>

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: XiDuo You

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Jul/20 01:08

Updated:: 12/Dec/22 18:10