Description
sql with lateral view didn't push down partition column as expected!.
here is how it can be reproduced.
1. create test table
create table test_lateral_view (id bigint,json_cont string) partitioned by (vt string);
2. explain below sql
select *
from test_lateral_view a
lateral view json_tuple(json_cont, 'iids', 'indexs') b as iids,indexs
where a.vt = '2016-10-27'
and rand()>0.5;
here is my result:
STAGE DEPENDENCIES:
Stage-0 is a root stageSTAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: a
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Lateral View Forward
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: id (type: bigint), json_cont (type: string), vt (type: string)
outputColumnNames: id, json_cont, vt
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Lateral View Join Operator
outputColumnNames: _col0, _col1, _col2, _col6, _col7
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: ((_col2 = '2016-10-27') and (rand() > 0.5)) (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: _col0 (type: bigint), _col1 (type: string), '2016-10-27' (type: string), _col6 (type: string), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
ListSink
Select Operator
expressions: json_cont (type: string), 'iids' (type: string), 'indexs' (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
UDTF Operator
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
function name: json_tuple
Lateral View Join Operator
outputColumnNames: _col0, _col1, _col2, _col6, _col7
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: ((_col2 = '2016-10-27') and (rand() > 0.5)) (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: _col0 (type: bigint), _col1 (type: string), '2016-10-27' (type: string), _col6 (type: string), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
ListSink
As you can see,the partition column is in filter operator,which means this sql will scan the whole table.