Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.0
-
None
Description
There is a redundant filter operator in reducer Vertex when CBO is disabled.
Query
select ss_item_sk, ss_ticket_number, ss_store_sk from store_sales a, store_returns b, store where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number and ss_sold_date_sk between 2450816 and 2451500 and sr_returned_date_sk between 2450816 and 2451500 and s_store_sk = ss_store_sk;
Plan snippet
Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((((_col1 = _col27) and (_col8 = _col34)) and _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6)) (type: boolean)
Full plan with CBO disabled
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 (SIMPLE_EDGE) DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13 Vertices: Map 1 Map Operator Tree: TableScan alias: b filterExpr: ((sr_item_sk is not null and sr_ticket_number is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: boolean) Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean) Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: sr_item_sk (type: int), sr_ticket_number (type: int) sort order: ++ Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int) Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats: COMPLETE Column stats: COMPLETE value expressions: sr_returned_date_sk (type: int) Execution mode: vectorized Map 3 Map Operator Tree: TableScan alias: store filterExpr: s_store_sk is not null (type: boolean) Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: s_store_sk is not null (type: boolean) Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: s_store_sk (type: int) sort order: + Map-reduce partition columns: s_store_sk (type: int) Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: a filterExpr: (((ss_item_sk is not null and ss_ticket_number is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 AND 2451500) (type: boolean) Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((ss_item_sk is not null and ss_ticket_number is not null) and ss_store_sk is not null) (type: boolean) Statistics: Num rows: 8405840828 Data size: 110101408700 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: ss_item_sk (type: int), ss_ticket_number (type: int) sort order: ++ Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int) Statistics: Num rows: 8405840828 Data size: 110101408700 Basic stats: COMPLETE Column stats: COMPLETE value expressions: ss_store_sk (type: int), ss_sold_date_sk (type: int) Execution mode: vectorized Reducer 2 Reduce Operator Tree: Merge Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {KEY.reducesinkkey0} {VALUE._col5} {KEY.reducesinkkey1} {VALUE._col20} 1 {KEY.reducesinkkey0} {KEY.reducesinkkey1} {VALUE._col17} outputColumnNames: _col1, _col6, _col8, _col22, _col27, _col34, _col45 Statistics: Num rows: 57439343 Data size: 1148786860 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col1} {_col6} {_col8} {_col22} {_col27} {_col34} {_col45} 1 {s_store_sk} keys: 0 _col6 (type: int) 1 s_store_sk (type: int) outputColumnNames: _col1, _col6, _col8, _col22, _col27, _col34, _col45, _col49 input vertices: 1 Map 3 Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((((_col1 = _col27) and (_col8 = _col34)) and _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6)) (type: boolean) Statistics: Num rows: 1794979 Data size: 57439328 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1794979 Data size: 21539748 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 1794979 Data size: 21539748 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink
Full plan with CBO enabled
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 4 <- Map 1 (BROADCAST_EDGE) Reducer 3 <- Map 2 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE) DagName: mmokhtar_20150214182525_63a9838f-db9f-40e9-8ae1-77c77143dccf:12 Vertices: Map 1 Map Operator Tree: TableScan alias: store filterExpr: s_store_sk is not null (type: boolean) Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: s_store_sk is not null (type: boolean) Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: s_store_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: b filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean) Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean) Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: sr_item_sk (type: int), sr_ticket_number (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int), _col1 (type: int) Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: a filterExpr: ((ss_store_sk is not null and ss_item_sk is not null) and ss_ticket_number is not null) (type: boolean) Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((ss_store_sk is not null and ss_item_sk is not null) and ss_ticket_number is not null) (type: boolean) Statistics: Num rows: 8405840828 Data size: 76478045388 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ss_item_sk (type: int), ss_store_sk (type: int), ss_ticket_number (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 8405840828 Data size: 76478045388 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col1} {_col2} 1 keys: 0 _col1 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col1, _col2 input vertices: 1 Map 1 Statistics: Num rows: 8405840896 Data size: 100870090752 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col2 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int), _col2 (type: int) Statistics: Num rows: 8405840896 Data size: 100870090752 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: int) Execution mode: vectorized Reducer 3 Reduce Operator Tree: Merge Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {KEY.reducesinkkey0} {VALUE._col0} {KEY.reducesinkkey1} 1 outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int), _col2 (type: int), _col1 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink
Attachments
Attachments
Issue Links
- breaks
-
HIVE-13693 Multi-insert query drops Filter before file output when there is a.val <> b.val
- Closed
- links to