Description
This is a regression introduced in the latest build of the CBO branch.
Removing the subquery for item will remove the cross products
Query
select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, date_dim, item where item.i_item_id in (select i.i_item_id from item i where i_color in ('purple','burlywood','indian')) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 group by i_item_id;
Warning: Map Join MAPJOIN[38][bigTable=?] in task 'Map 1' is a cross product Warning: Map Join MAPJOIN[39][bigTable=store_sales] in task 'Map 4' is a cross product OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 <- Map 3 (BROADCAST_EDGE) Map 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE) Reducer 5 <- Map 4 (SIMPLE_EDGE) DagName: mmokhtar_20140904141313_9c253f7e-aad1-4ca4-9be1-ea45e3d34496:1 Vertices: Map 1 Map Operator Tree: TableScan alias: item filterExpr: (true and i_item_id is not null) (type: boolean) Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: i_item_id is not null (type: boolean) Statistics: Num rows: 231000 Data size: 331931080 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {i_item_sk} {i_item_id} 1 {d_date_sk} keys: 0 1 outputColumnNames: _col0, _col1, _col25 input vertices: 1 Map 3 Statistics: Num rows: 254100 Data size: 365124192 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: int), _col1 (type: string), _col25 (type: int) outputColumnNames: _col0, _col1, _col25 Statistics: Num rows: 254100 Data size: 365124192 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 254100 Data size: 365124192 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: string), _col25 (type: int) Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: i filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id is not null) (type: boolean) Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id is not null) (type: boolean) Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: i_item_id (type: string) outputColumnNames: _col0 Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE Column stats: NONE Execution mode: vectorized Map 3 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean) Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE value expressions: d_date_sk (type: int) Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: store_sales Statistics: Num rows: 82510879939 Data size: 7203833257964 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_sold_date_sk} {ss_item_sk} {ss_ext_sales_price} 1 {_col0} {_col1} {_col25} keys: 0 1 outputColumnNames: _col0, _col2, _col15, _col27, _col28, _col52 input vertices: 1 Map 1 Statistics: Num rows: 90761969664 Data size: 7924217282560 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((_col2 = _col27) and (_col0 = _col52)) (type: boolean) Statistics: Num rows: 22690492416 Data size: 1981054320640 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col15 (type: float), _col28 (type: string) outputColumnNames: _col15, _col59 Statistics: Num rows: 22690492416 Data size: 1981054320640 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Left Semi Join 0 to 1 condition expressions: 0 {_col15} {_col59} 1 keys: 0 _col59 (type: string) 1 _col0 (type: string) outputColumnNames: _col15, _col59 input vertices: 1 Map 2 Statistics: Num rows: 24959541248 Data size: 2179159818240 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col59 (type: string), _col15 (type: float) outputColumnNames: _col0, _col1 Statistics: Num rows: 24959541248 Data size: 2179159818240 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(_col1) keys: _col0 (type: string) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 24959541248 Data size: 2179159818240 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 24959541248 Data size: 2179159818240 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: double) Execution mode: vectorized Reducer 5 Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 12479770624 Data size: 1089579909120 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), _col1 (type: double) outputColumnNames: _col0, _col1 Statistics: Num rows: 12479770624 Data size: 1089579909120 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 12479770624 Data size: 1089579909120 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink
No cross product generated for this query
select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, date_dim, item where i_color in ('purple','burlywood','indian') and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 group by i_item_id;
Attachments
Attachments
Issue Links
- links to