Description
Simplify predicates for disjunctive predicates so that can get pushed down to the scan.
For TPC-DS query 13 we push down predicates in the following form
where c_martial_status in ('M','D','U') etc..
select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('WI', 'MO', 'WV') and store_sales.ss_net_profit between 50 and 250 )) ;
This is the plan currently generated without any predicate simplification
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 7 <- Map 8 (BROADCAST_EDGE) Map 8 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE) Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1 Vertices: Map 1 Map Operator Tree: TableScan alias: customer_address Statistics: Num rows: 40000000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: ca_address_sk (type: int), ca_state (type: string), ca_country (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 40000000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 40000000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string) Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Execution mode: vectorized Map 5 Map Operator Tree: TableScan alias: household_demographics Statistics: Num rows: 7200 Data size: 770400 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: hd_demo_sk (type: int), hd_dep_count (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 7200 Data size: 770400 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 7200 Data size: 770400 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: int) Execution mode: vectorized Map 6 Map Operator Tree: TableScan alias: store filterExpr: (true and s_store_sk is not null) (type: boolean) Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: s_store_sk is not null (type: boolean) Statistics: Num rows: 852 Data size: 1628138 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: s_store_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 852 Data size: 1628138 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 852 Data size: 1628138 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int) Execution mode: vectorized Map 7 Map Operator Tree: TableScan alias: store_sales filterExpr: (ss_store_sk is not null and ss_sold_date_sk is not null) (type: boolean) Statistics: Num rows: 82510879939 Data size: 7203833257964 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (ss_store_sk is not null and ss_sold_date_sk is not null) (type: boolean) Statistics: Num rows: 20627719985 Data size: 1800958314512 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: ss_sold_date_sk (type: int), ss_cdemo_sk (type: int), ss_hdemo_sk (type: int), ss_addr_sk (type: int), ss_store_sk (type: int), ss_quantity (type: int), ss_sales_price (type: float), ss_ext_sales_price (type: float), ss_ext_wholesale_cost (type: float), ss_net_profit (type: float) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9 Statistics: Num rows: 20627719985 Data size: 1800958314512 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col1} {_col2} {_col4} {_col5} 1 {_col0} {_col1} {_col2} {_col3} {_col5} {_col6} {_col7} {_col8} {_col9} keys: 0 _col3 (type: int) 1 _col4 (type: int) outputColumnNames: _col0, _col1, _col2, _col4, _col5, _col6, _col7, _col8, _col9, _col11, _col12, _col13, _col14, _col15 input vertices: 0 Map 8 Statistics: Num rows: 22690492416 Data size: 1981054320640 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (((_col8 = _col4) and ((_col0 = _col7) and ((_col1 = 'M') and ((_col2 = '4 yr Degree') and (_col12 BETWEEN 100 AND 150 and (_col5 = 3)))))) or (((_col8 = _col4) and ((_col0 = _col7) and ((_col1 = 'D') and ((_col2 = 'Primary') and (_col12 BETWEEN 50 AND 100 and (_col5 = 1)))))) or ((_col8 = _col4) and ((_col0 = _col7) and ((_col1 = 'U') and ((_col2 = 'Advanced Degree') and (_col12 BETWEEN 150 AND 200 and (_col5 = 1)))))))) (type: boolean) Statistics: Num rows: 1063616832 Data size: 92861921280 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col6 (type: int), _col9 (type: int), _col11 (type: int), _col13 (type: float), _col14 (type: float), _col15 (type: float) outputColumnNames: _col0, _col3, _col5, _col7, _col8, _col9 Statistics: Num rows: 1063616832 Data size: 92861921280 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1063616832 Data size: 92861921280 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col3 (type: int), _col5 (type: int), _col7 (type: float), _col8 (type: float), _col9 (type: float) Execution mode: vectorized Map 8 Map Operator Tree: TableScan alias: customer_demographics Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: cd_demo_sk (type: int), cd_marital_status (type: string), cd_education_status (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col1} {_col2} 1 {_col0} keys: 0 1 outputColumnNames: _col0, _col1, _col2, _col3 input vertices: 1 Map 6 Statistics: Num rows: 2112880 Data size: 790217152 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col1} {_col2} {_col3} 1 {_col0} {_col1} keys: 0 1 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 input vertices: 1 Map 5 Statistics: Num rows: 2324168 Data size: 869238912 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col3 (type: int) sort order: + Map-reduce partition columns: _col3 (type: int) Statistics: Num rows: 2324168 Data size: 869238912 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string), _col4 (type: int), _col5 (type: int) Execution mode: vectorized Reducer 2 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col3} {VALUE._col5} {VALUE._col7} {VALUE._col8} {VALUE._col9} 1 {VALUE._col0} {VALUE._col1} {VALUE._col2} outputColumnNames: _col0, _col3, _col5, _col7, _col8, _col9, _col16, _col17, _col18 Statistics: Num rows: 1169978496 Data size: 102148120576 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (((_col3 = _col16) and ((_col18 = 'United States') and ((_col17) IN ('KY', 'GA', 'NM') and _col9 BETWEEN 100 AND 200))) or (((_col3 = _col16) and ((_col18 = 'United States') and ((_col17) IN ('MT', 'OR', 'IN') and _col9 BETWEEN 150 AND 300))) or ((_col3 = _col16) and ((_col18 = 'United States') and ((_col17) IN ('WI', 'MO', 'WV') and _col9 BETWEEN 50 AND 250))))) (type: boolean) Statistics: Num rows: 219370968 Data size: 19152772608 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: int), _col5 (type: int), _col7 (type: float), _col8 (type: float) outputColumnNames: _col0, _col5, _col7, _col8 Statistics: Num rows: 219370968 Data size: 19152772608 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col5} {_col7} {_col8} 1 keys: 0 _col0 (type: int) 1 _col0 (type: int) outputColumnNames: _col5, _col7, _col8 input vertices: 1 Map 4 Statistics: Num rows: 241308080 Data size: 21068050432 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col5 (type: int), _col7 (type: float), _col8 (type: float) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 241308080 Data size: 21068050432 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: avg(_col0), avg(_col1), avg(_col2), sum(_col2) mode: hash outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: struct<count:bigint,sum:double,input:int>), _col1 (type: struct<count:bigint,sum:double,input:float>), _col2 (type: struct<count:bigint,sum:double,input:float>), _col3 (type: double) Reducer 3 Reduce Operator Tree: Group By Operator aggregations: avg(VALUE._col0), avg(VALUE._col1), avg(VALUE._col2), sum(VALUE._col3) mode: mergepartial outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: double), _col1 (type: double), _col2 (type: double), _col3 (type: double) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink
Attachments
Issue Links
- is cloned by
-
HIVE-7914 Simplify join predicates for CBO to avoid cross products
- Closed