Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
We can remove some redundant project after we completed pruning column.
e.g.,
create table t1(c1 int, c2 int) using parquet; explain extended select sum(c1) from ( select * from t1 );
Currently we get this plan.
== Physical Plan == *(2) HashAggregate(keys=[], functions=[sum(cast(c1#19 as bigint))], output=[sum(c1)#68L]) +- Exchange SinglePartition, true, [id=#86] +- *(1) HashAggregate(keys=[], functions=[partial_sum(cast(c1#19 as bigint))], output=[sum#70L]) +- *(1) Project [c1#19] +- *(1) ColumnarToRow +- FileScan parquet default.t1[c1#19] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[hdfs:///user/hive/warehouse/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c1:int>
We can remove the `Project`, like this
== Physical Plan == *(2) HashAggregate(keys=[], functions=[sum(cast(c1#19 as bigint))], output=[sum(c1)#68L]) +- Exchange SinglePartition, true, [id=#86] +- *(1) HashAggregate(keys=[], functions=[partial_sum(cast(c1#19 as bigint))], output=[sum#70L]) +- *(1) ColumnarToRow +- FileScan parquet default.t1[c1#19] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[hdfs:///user/hive/warehouse/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c1:int>