[SPARK-24339] spark sql can not prune column in transform/map/reduce query - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.4.0
Component/s: SQL
Labels:
- map
- reduce
- sql
- transform

Flags:

Patch

Description

I was using TRANSFORM USING with branch-2.1/2.2, and noticed that it will scan all column of data, query like:

SELECT TRANSFORM(usid, cch) USING 'python test.py' AS (u1, c1, u2, c2) FROM test_table;

it's physical plan like:

== Physical Plan ==
ScriptTransformation [usid#17, cch#9], python test.py, [u1#784, c1#785, u2#786, c2#787], HiveScriptIOSchema(List(),List(),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),List((field.delim,	)),List((field.delim,	)),Some(org.apache.hadoop.hive.ql.exec.TextRecordReader),Some(org.apache.hadoop.hive.ql.exec.TextRecordWriter),false)
+- FileScan parquet [sh#0L,clk#1L,chg#2L,qey#3,ship#4,chgname#5,sid#6,bid#7,dis#8L,cch#9,wch#10,wid#11L,arank#12L,rtag#13,iid#14,uid#15,pid#16,usid#17,wdid#18,bid#19,oqleft#20,oqright#21,poqvalue#22,tm#23,... 367 more fields] Batched: false, Format: Parquet, Location: InMemoryFileIndex[file:/Users/Downloads/part-r-00093-0ef5d59f-2e08-4085-9b46-458a1652932a.g..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<sh:bigint,clk:bigint,chg:bigint,qey:string,ship:string,chgname:string,s...

In our scenario, parquet has 400 column, this query will take more time.

Attachments

Issue Links

links to

[Github] Pull Request #21447 (xdcjie)

[Github] Pull Request #21839 (xuanyuanking)

Activity

People

Assignee:: Yuanjian Li

Reporter:: xdcjie

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/May/18 04:34

Updated:: 12/Dec/22 18:10

Resolved:: 23/Jul/18 20:05