Description
Simplify predicates for disjunctive predicates so that can get pushed down to the scan.
Looks like this is still an issue, some of the filters can be pushed down to the scan.
set hive.cbo.enable=true set hive.stats.fetch.column.stats=true set hive.exec.dynamic.partition.mode=nonstrict set hive.tez.auto.reducer.parallelism=true set hive.auto.convert.join.noconditionaltask.size=320000000 set hive.exec.reducers.bytes.per.reducer=100000000 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager set hive.support.concurrency=false set hive.tez.exec.print.summary=true explain select substr(r_reason_desc,1,20) as r ,avg(ws_quantity) wq ,avg(wr_refunded_cash) ref ,avg(wr_fee) fee from web_sales, web_returns, web_page, customer_demographics cd1, customer_demographics cd2, customer_address, date_dim, reason where web_sales.ws_web_page_sk = web_page.wp_web_page_sk and web_sales.ws_item_sk = web_returns.wr_item_sk and web_sales.ws_order_number = web_returns.wr_order_number and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998 and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk and reason.r_reason_sk = web_returns.wr_reason_sk and ( ( cd1.cd_marital_status = 'M' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = '4 yr Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 100.00 and 150.00 ) or ( cd1.cd_marital_status = 'D' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Primary' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 50.00 and 100.00 ) or ( cd1.cd_marital_status = 'U' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Advanced Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 150.00 and 200.00 ) ) and ( ( ca_country = 'United States' and ca_state in ('KY', 'GA', 'NM') and ws_net_profit between 100 and 200 ) or ( ca_country = 'United States' and ca_state in ('MT', 'OR', 'IN') and ws_net_profit between 150 and 300 ) or ( ca_country = 'United States' and ca_state in ('WI', 'MO', 'WV') and ws_net_profit between 50 and 250 ) ) group by r_reason_desc order by r, wq, ref, fee limit 100 OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 9 <- Map 1 (BROADCAST_EDGE) Reducer 3 <- Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE) Reducer 4 <- Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE) Reducer 6 <- Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE) Reducer 7 <- Reducer 6 (SIMPLE_EDGE) Reducer 8 <- Reducer 7 (SIMPLE_EDGE) DagName: mmokhtar_20141111161818_f5fd23ba-d783-4b13-8507-7faa65851798:1 Vertices: Map 1 Map Operator Tree: TableScan alias: web_page filterExpr: wp_web_page_sk is not null (type: boolean) Statistics: Num rows: 4602 Data size: 2696178 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: wp_web_page_sk is not null (type: boolean) Statistics: Num rows: 4602 Data size: 18408 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: wp_web_page_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 4602 Data size: 18408 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 4602 Data size: 18408 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 10 Map Operator Tree: TableScan alias: customer_address filterExpr: ((ca_country = 'United States') and ca_address_sk is not null) (type: boolean) Statistics: Num rows: 40000000 Data size: 40595195284 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((ca_country = 'United States') and ca_address_sk is not null) (type: boolean) Statistics: Num rows: 20000000 Data size: 3740000000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ca_address_sk (type: int), ca_state (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 20000000 Data size: 1800000000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20000000 Data size: 1800000000 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string) Execution mode: vectorized Map 11 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 1998) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((d_year = 1998) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 652 Data size: 5216 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: _col0 (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 326 Data size: 1304 Basic stats: COMPLETE Column stats: COMPLETE Dynamic Partitioning Event Operator Target Input: web_sales Partition key expr: ws_sold_date_sk Statistics: Num rows: 326 Data size: 1304 Basic stats: COMPLETE Column stats: COMPLETE Target column: ws_sold_date_sk Target Vertex: Map 9 Execution mode: vectorized Map 12 Map Operator Tree: TableScan alias: reason filterExpr: r_reason_sk is not null (type: boolean) Statistics: Num rows: 72 Data size: 14400 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: r_reason_sk is not null (type: boolean) Statistics: Num rows: 72 Data size: 7272 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: r_reason_sk (type: int), r_reason_desc (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 72 Data size: 7272 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 72 Data size: 7272 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string) Execution mode: vectorized Map 13 Map Operator Tree: TableScan alias: web_returns filterExpr: (((((wr_refunded_cdemo_sk is not null and wr_item_sk is not null) and wr_order_number is not null) and wr_returning_cdemo_sk is not null) and wr_refunded_addr_sk is not null) and wr_reason_sk is not null) (type: boolean) Statistics: Num rows: 2062802370 Data size: 185695406284 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((((wr_refunded_cdemo_sk is not null and wr_item_sk is not null) and wr_order_number is not null) and wr_returning_cdemo_sk is not null) and wr_refunded_addr_sk is not null) and wr_reason_sk is not null) (type: boolean) Statistics: Num rows: 1875154722 Data size: 58944640412 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: wr_item_sk (type: int), wr_refunded_cdemo_sk (type: int), wr_refunded_addr_sk (type: int), wr_returning_cdemo_sk (type: int), wr_reason_sk (type: int), wr_order_number (type: int), wr_fee (type: float), wr_refunded_cash (type: float) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7 Statistics: Num rows: 1875154722 Data size: 58944640412 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col1 (type: int) sort order: + Map-reduce partition columns: _col1 (type: int) Statistics: Num rows: 1875154722 Data size: 58944640412 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col6 (type: float), _col7 (type: float) Execution mode: vectorized Map 14 Map Operator Tree: TableScan alias: cd1 filterExpr: ((cd_demo_sk is not null and cd_marital_status is not null) and cd_education_status is not null) (type: boolean) Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((cd_demo_sk is not null and cd_marital_status is not null) and cd_education_status is not null) (type: boolean) Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: cd_demo_sk (type: int), cd_marital_status (type: string), cd_education_status (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string) sort order: +++ Map-reduce partition columns: _col0 (type: int), _col1 (type: string), _col2 (type: string) Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: cd1 filterExpr: ((cd_demo_sk is not null and cd_marital_status is not null) and cd_education_status is not null) (type: boolean) Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((cd_demo_sk is not null and cd_marital_status is not null) and cd_education_status is not null) (type: boolean) Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: cd_demo_sk (type: int), cd_marital_status (type: string), cd_education_status (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string), _col2 (type: string) Execution mode: vectorized Map 9 Map Operator Tree: TableScan alias: web_sales filterExpr: ((ws_web_page_sk is not null and ws_item_sk is not null) and ws_order_number is not null) (type: boolean) Statistics: Num rows: 21594638446 Data size: 2850189889652 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((ws_web_page_sk is not null and ws_item_sk is not null) and ws_order_number is not null) (type: boolean) Statistics: Num rows: 21591939929 Data size: 604541956128 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ws_item_sk (type: int), ws_web_page_sk (type: int), ws_order_number (type: int), ws_quantity (type: int), ws_sales_price (type: float), ws_net_profit (type: float), ws_sold_date_sk (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6 Statistics: Num rows: 21591939929 Data size: 604541956128 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col2} {_col3} {_col4} {_col5} {_col6} 1 keys: 0 _col1 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6 input vertices: 1 Map 1 Statistics: Num rows: 21591939072 Data size: 518206537728 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col2 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int), _col2 (type: int) Statistics: Num rows: 21591939072 Data size: 518206537728 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col3 (type: int), _col4 (type: float), _col5 (type: float), _col6 (type: int) Execution mode: vectorized Reducer 3 Reduce Operator Tree: Merge Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4} {VALUE._col5} {VALUE._col6} 1 {VALUE._col0} {VALUE._col1} outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col9, _col10 Statistics: Num rows: 1875154688 Data size: 373155782912 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int), _col10 (type: string), _col2 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col6 (type: float), _col7 (type: float), _col9 (type: string) outputColumnNames: _col0, _col10, _col2, _col3, _col4, _col5, _col6, _col7, _col9 Statistics: Num rows: 1875154688 Data size: 373155782912 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col5 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int), _col5 (type: int) Statistics: Num rows: 1875154688 Data size: 373155782912 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col2 (type: int), _col3 (type: int), _col4 (type: int), _col6 (type: float), _col7 (type: float), _col9 (type: string), _col10 (type: string) Reducer 4 Reduce Operator Tree: Merge Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4} 1 {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4} {VALUE._col5} {VALUE._col7} {VALUE._col8} outputColumnNames: _col3, _col4, _col5, _col6, _col10, _col11, _col12, _col14, _col15, _col17, _col18 Statistics: Num rows: 57653145 Data size: 11472975855 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((_col17 = 'M') and ((_col18 = '4 yr Degree') and _col4 BETWEEN 100.0 AND 150.0)) or (((_col17 = 'D') and ((_col18 = 'Primary') and _col4 BETWEEN 50.0 AND 100.0)) or ((_col17 = 'U') and ((_col18 = 'Advanced Degree') and _col4 BETWEEN 150.0 AND 200.0)))) (type: boolean) Statistics: Num rows: 57653145 Data size: 11472975855 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col11 (type: int), _col12 (type: int), _col14 (type: float), _col15 (type: float), _col17 (type: string), _col18 (type: string), _col3 (type: int), _col5 (type: float), _col6 (type: int), _col10 (type: int) outputColumnNames: _col10, _col11, _col13, _col14, _col17, _col18, _col3, _col5, _col6, _col9 Statistics: Num rows: 57653145 Data size: 11472975855 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col10 (type: int), _col17 (type: string), _col18 (type: string) sort order: +++ Map-reduce partition columns: _col10 (type: int), _col17 (type: string), _col18 (type: string) Statistics: Num rows: 57653145 Data size: 11472975855 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col3 (type: int), _col5 (type: float), _col6 (type: int), _col9 (type: int), _col11 (type: int), _col13 (type: float), _col14 (type: float) Reducer 5 Reduce Operator Tree: Merge Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {VALUE._col3} {VALUE._col5} {VALUE._col6} {VALUE._col9} {VALUE._col10} {VALUE._col12} {VALUE._col13} outputColumnNames: _col6, _col8, _col9, _col12, _col14, _col16, _col17 Statistics: Num rows: 3187317548 Data size: 50997080768 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col12 (type: int), _col14 (type: int), _col16 (type: float), _col17 (type: float), _col6 (type: int), _col8 (type: float), _col9 (type: int) outputColumnNames: _col12, _col14, _col16, _col17, _col6, _col8, _col9 Statistics: Num rows: 3187317548 Data size: 50997080768 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col12 (type: int) sort order: + Map-reduce partition columns: _col12 (type: int) Statistics: Num rows: 3187317548 Data size: 50997080768 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col6 (type: int), _col8 (type: float), _col9 (type: int), _col14 (type: int), _col16 (type: float), _col17 (type: float) Reducer 6 Reduce Operator Tree: Merge Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} 1 {VALUE._col6} {VALUE._col8} {VALUE._col9} {VALUE._col13} {VALUE._col15} {VALUE._col16} outputColumnNames: _col1, _col9, _col11, _col12, _col17, _col19, _col20 Statistics: Num rows: 1593658752 Data size: 156178557696 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((_col1) IN ('KY', 'GA', 'NM') and _col11 BETWEEN 100 AND 200) or (((_col1) IN ('MT', 'OR', 'IN') and _col11 BETWEEN 150 AND 300) or ((_col1) IN ('WI', 'MO', 'WV') and _col11 BETWEEN 50 AND 250))) (type: boolean) Statistics: Num rows: 1195244064 Data size: 117133918272 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col17 (type: int), _col19 (type: float), _col20 (type: float), _col9 (type: int), _col12 (type: int) outputColumnNames: _col11, _col13, _col14, _col3, _col6 Statistics: Num rows: 1195244064 Data size: 14342928768 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col3} {_col11} {_col13} {_col14} keys: 0 _col0 (type: int) 1 _col6 (type: int) outputColumnNames: _col5, _col13, _col15, _col16 input vertices: 0 Map 11 Statistics: Num rows: 1334416318 Data size: 16012995816 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col13 (type: int), _col15 (type: float), _col16 (type: float), _col5 (type: int) outputColumnNames: _col13, _col15, _col16, _col5 Statistics: Num rows: 1334416318 Data size: 16012995816 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col1} 1 {_col5} {_col15} {_col16} keys: 0 _col0 (type: int) 1 _col13 (type: int) outputColumnNames: _col1, _col7, _col17, _col18 input vertices: 0 Map 12 Statistics: Num rows: 1334416256 Data size: 140113706880 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col1 (type: string), _col7 (type: int), _col18 (type: float), _col17 (type: float) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 1334416256 Data size: 140113706880 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator aggregations: avg(_col1), avg(_col2), avg(_col3) keys: _col0 (type: string) mode: hash outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 157024 Data size: 15231328 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 157024 Data size: 15231328 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: struct<count:bigint,sum:double,input:int>), _col2 (type: struct<count:bigint,sum:double,input:float>), _col3 (type: struct<count:bigint,sum:double,input:float>) Reducer 7 Reduce Operator Tree: Group By Operator aggregations: avg(VALUE._col0), avg(VALUE._col1), avg(VALUE._col2) keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 112 Data size: 13552 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: substr(_col0, 1, 20) (type: string), _col1 (type: double), _col2 (type: double), _col3 (type: double) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 112 Data size: 23296 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: double), _col2 (type: double), _col3 (type: double) sort order: ++++ Statistics: Num rows: 112 Data size: 23296 Basic stats: COMPLETE Column stats: COMPLETE TopN Hash Memory Usage: 0.04 Reducer 8 Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type: double), KEY.reducesinkkey2 (type: double), KEY.reducesinkkey3 (type: double) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 112 Data size: 23296 Basic stats: COMPLETE Column stats: COMPLETE Limit Number of rows: 100 Statistics: Num rows: 100 Data size: 20800 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 100 Data size: 20800 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Execution mode: vectorized Stage: Stage-0
Attachments
Attachments
Issue Links
- is related to
-
HIVE-12021 HivePreFilteringRule may introduce wrong common operands
- Closed
-
HIVE-11424 Rule to transform OR clauses into IN clauses in CBO
- Closed
- relates to
-
HIVE-10770 Recognize additional common factors in Filter predicates
- Open
-
HIVE-7914 Simplify join predicates for CBO to avoid cross products
- Closed