Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.14.0
-
None
Description
Reducer Vertices sending data to a Union all edge are missing statistics and as a result we either use very few reducers in the UNION ALL edge or decide to broadcast the results of UNION ALL.
Query
select count(*) rowcount from (select ss_item_sk, ss_ticket_number, ss_store_sk from store_sales a, store_returns b where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number union all select ss_item_sk, ss_ticket_number, ss_store_sk from store_sales c, store_returns d where c.ss_item_sk = d.sr_item_sk and c.ss_ticket_number = d.sr_ticket_number) t group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number having rowcount > 100000000;
Plan snippet
Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS) Reducer 4 <- Union 3 (SIMPLE_EDGE) Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 (CONTAINS) Reducer 4 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 (type: int) mode: mergepartial outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col3 > 100000000) (type: boolean) Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE Select Operator expressions: _col3 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Reducer 7 Reduce Operator Tree: Merge Join Operator condition map: Inner Join 0 to 1 keys: 0 ss_item_sk (type: int), ss_ticket_number (type: int) 1 sr_item_sk (type: int), sr_ticket_number (type: int) outputColumnNames: _col1, _col6, _col8, _col27, _col34 Filter Operator predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean) Select Operator expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int) outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: count() keys: _col2 (type: int), _col0 (type: int), _col1 (type: int) mode: hash outputColumnNames: _col0, _col1, _col2, _col3 Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int) sort order: +++ Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int) value expressions: _col3 (type: bigint)
The full explain plan
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS) Reducer 4 <- Union 3 (SIMPLE_EDGE) Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 (CONTAINS) DagName: mmokhtar_20150214132727_95878ea1-ee6a-4b7e-bc86-843abd5cf664:7 Vertices: Map 1 Map Operator Tree: TableScan alias: a filterExpr: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean) Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean) Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: ss_item_sk (type: int), ss_ticket_number (type: int) sort order: ++ Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int) Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE value expressions: ss_store_sk (type: int) Map 5 Map Operator Tree: TableScan alias: b filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean) Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean) Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: sr_item_sk (type: int), sr_ticket_number (type: int) sort order: ++ Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int) Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE Map 6 Map Operator Tree: TableScan alias: c filterExpr: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean) Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean) Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: ss_item_sk (type: int), ss_ticket_number (type: int) sort order: ++ Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int) Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE value expressions: ss_store_sk (type: int) Map 8 Map Operator Tree: TableScan alias: d filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean) Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean) Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: sr_item_sk (type: int), sr_ticket_number (type: int) sort order: ++ Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int) Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE Reducer 2 Reduce Operator Tree: Merge Join Operator condition map: Inner Join 0 to 1 keys: 0 ss_item_sk (type: int), ss_ticket_number (type: int) 1 sr_item_sk (type: int), sr_ticket_number (type: int) outputColumnNames: _col1, _col6, _col8, _col27, _col34 Filter Operator predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean) Select Operator expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int) outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: count() keys: _col2 (type: int), _col0 (type: int), _col1 (type: int) mode: hash outputColumnNames: _col0, _col1, _col2, _col3 Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int) sort order: +++ Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int) value expressions: _col3 (type: bigint) Reducer 4 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 (type: int) mode: mergepartial outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col3 > 100000000) (type: boolean) Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE Select Operator expressions: _col3 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Reducer 7 Reduce Operator Tree: Merge Join Operator condition map: Inner Join 0 to 1 keys: 0 ss_item_sk (type: int), ss_ticket_number (type: int) 1 sr_item_sk (type: int), sr_ticket_number (type: int) outputColumnNames: _col1, _col6, _col8, _col27, _col34 Filter Operator predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean) Select Operator expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int) outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: count() keys: _col2 (type: int), _col0 (type: int), _col1 (type: int) mode: hash outputColumnNames: _col0, _col1, _col2, _col3 Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int) sort order: +++ Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int) value expressions: _col3 (type: bigint) Union 3 Vertex: Union 3 Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink
Also TPC-DS Q54 fails with OOM, this failure happens when we chose a different plan.
The OOM happens in vertexName=Map 14
explain with my_customers as ( select c_customer_sk , c_current_addr_sk from ( select cs_sold_date_sk sold_date_sk, cs_bill_customer_sk customer_sk, cs_item_sk item_sk from catalog_sales union all select ws_sold_date_sk sold_date_sk, ws_bill_customer_sk customer_sk, ws_item_sk item_sk from web_sales ) cs_or_ws_sales, item, date_dim, customer where sold_date_sk = d_date_sk and item_sk = i_item_sk and i_category = 'Jewelry' and i_class = 'football' and c_customer_sk = cs_or_ws_sales.customer_sk and d_moy = 3 and d_year = 2000 group by c_customer_sk , c_current_addr_sk ) , my_revenue as ( select c_customer_sk, sum(ss_ext_sales_price) as revenue from my_customers, store_sales, customer_address, store, date_dim where c_current_addr_sk = ca_address_sk and ca_county = s_county and ca_state = s_state and ss_sold_date_sk = d_date_sk and c_customer_sk = ss_customer_sk and d_month_seq between (1203) and (1205) group by c_customer_sk ) , segments as (select cast((revenue/50) as int) as segment from my_revenue ) select segment, count(*) as num_customers, segment*50 as segment_base from segments group by segment order by segment, num_customers limit 100 OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE) Map 10 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS) Map 12 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS) Map 14 <- Union 11 (BROADCAST_EDGE) Map 6 <- Map 7 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE) Map 8 <- Map 14 (BROADCAST_EDGE) Reducer 2 <- Map 1 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) Reducer 4 <- Reducer 3 (SIMPLE_EDGE) Reducer 9 <- Map 8 (SIMPLE_EDGE) DagName: mmokhtar_20150208232525_9976b56b-8f4b-48c8-a909-aa653c20051c:1 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_customer_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ss_customer_sk is not null (type: boolean) Statistics: Num rows: 80566020964 Data size: 951594129356 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ss_customer_sk (type: int), ss_ext_sales_price (type: float), ss_sold_date_sk (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 80566020964 Data size: 951594129356 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col2 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col1 input vertices: 1 Map 5 Statistics: Num rows: 90081226648 Data size: 720649813184 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0 (type: int) 1 _col5 (type: int) outputColumnNames: _col1, _col10 input vertices: 1 Map 6 Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col10 (type: int), _col1 (type: float) outputColumnNames: _col0, _col1 Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(_col1) keys: _col0 (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: double) Execution mode: vectorized Map 10 Map Operator Tree: TableScan alias: catalog_sales filterExpr: (cs_item_sk is not null and cs_bill_customer_sk is not null) (type: boolean) Filter Operator predicate: (cs_item_sk is not null and cs_bill_customer_sk is not null) (type: boolean) Select Operator expressions: cs_sold_date_sk (type: int), cs_bill_customer_sk (type: int), cs_item_sk (type: int) outputColumnNames: _col0, _col1, _col2 Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0 (type: int) 1 _col0 (type: int) outputColumnNames: _col1, _col2 input vertices: 1 Map 13 Reduce Output Operator key expressions: _col2 (type: int) sort order: + Map-reduce partition columns: _col2 (type: int) value expressions: _col1 (type: int) Execution mode: vectorized Map 12 Map Operator Tree: TableScan alias: web_sales filterExpr: (ws_item_sk is not null and ws_bill_customer_sk is not null) (type: boolean) Filter Operator predicate: (ws_item_sk is not null and ws_bill_customer_sk is not null) (type: boolean) Select Operator expressions: ws_sold_date_sk (type: int), ws_bill_customer_sk (type: int), ws_item_sk (type: int) outputColumnNames: _col0, _col1, _col2 Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0 (type: int) 1 _col0 (type: int) outputColumnNames: _col1, _col2 input vertices: 1 Map 13 Reduce Output Operator key expressions: _col2 (type: int) sort order: + Map-reduce partition columns: _col2 (type: int) value expressions: _col1 (type: int) Execution mode: vectorized Map 13 Map Operator Tree: TableScan alias: date_dim filterExpr: (((d_moy = 3) and (d_year = 2000)) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((d_moy = 3) and (d_year = 2000)) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 624 Data size: 7488 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: _col0 (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE Dynamic Partitioning Event Operator Target Input: catalog_sales Partition key expr: cs_sold_date_sk Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE Target column: cs_sold_date_sk Target Vertex: Map 10 Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: _col0 (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE Dynamic Partitioning Event Operator Target Input: web_sales Partition key expr: ws_sold_date_sk Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE Target column: ws_sold_date_sk Target Vertex: Map 12 Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 14 Map Operator Tree: TableScan alias: item filterExpr: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 4200 Data size: 781200 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: i_item_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 4200 Data size: 16800 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col2 (type: int) 1 _col0 (type: int) outputColumnNames: _col1 input vertices: 0 Union 11 Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE Reduce Output Operator key expressions: _col1 (type: int) sort order: + Map-reduce partition columns: _col1 (type: int) Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE Execution mode: vectorized Map 5 Map Operator Tree: TableScan alias: date_dim filterExpr: (d_month_seq BETWEEN 1203 AND 1205 and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (d_month_seq BETWEEN 1203 AND 1205 and d_date_sk is not null) (type: boolean) Statistics: Num rows: 36524 Data size: 292192 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: _col0 (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE Column stats: COMPLETE Dynamic Partitioning Event Operator Target Input: store_sales Partition key expr: ss_sold_date_sk Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE Column stats: COMPLETE Target column: ss_sold_date_sk Target Vertex: Map 1 Execution mode: vectorized Map 6 Map Operator Tree: TableScan alias: customer_address filterExpr: ((ca_county is not null and ca_state is not null) and ca_address_sk is not null) (type: boolean) Statistics: Num rows: 40000000 Data size: 40595195284 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((ca_county is not null and ca_state is not null) and ca_address_sk is not null) (type: boolean) Statistics: Num rows: 40000000 Data size: 7520000000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ca_address_sk (type: int), ca_county (type: string), ca_state (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 40000000 Data size: 7520000000 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col1 (type: string), _col2 (type: string) 1 _col0 (type: string), _col1 (type: string) outputColumnNames: _col0 input vertices: 1 Map 7 Statistics: Num rows: 778829 Data size: 3115316 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0 (type: int) 1 _col1 (type: int) outputColumnNames: _col5 input vertices: 1 Reducer 9 Statistics: Num rows: 47909545988 Data size: 0 Basic stats: PARTIAL Column stats: NONE Reduce Output Operator key expressions: _col5 (type: int) sort order: + Map-reduce partition columns: _col5 (type: int) Statistics: Num rows: 47909545988 Data size: 0 Basic stats: PARTIAL Column stats: NONE Execution mode: vectorized Map 7 Map Operator Tree: TableScan alias: store filterExpr: (s_county is not null and s_state is not null) (type: boolean) Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (s_county is not null and s_state is not null) (type: boolean) Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: s_county (type: string), s_state (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string) sort order: ++ Map-reduce partition columns: _col0 (type: string), _col1 (type: string) Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 8 Map Operator Tree: TableScan alias: customer filterExpr: (c_customer_sk is not null and c_current_addr_sk is not null) (type: boolean) Statistics: Num rows: 80000000 Data size: 68801615852 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (c_customer_sk is not null and c_current_addr_sk is not null) (type: boolean) Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: c_customer_sk (type: int), c_current_addr_sk (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0 (type: int) 1 _col1 (type: int) outputColumnNames: _col0, _col1 input vertices: 1 Map 14 Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL Column stats: NONE Group By Operator keys: _col0 (type: int), _col1 (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int), _col1 (type: int) Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL Column stats: NONE Execution mode: vectorized Reducer 2 Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: UDFToInteger((_col1 / 50.0)) (type: int) outputColumnNames: _col0 Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count() keys: _col0 (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 3 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: int), _col1 (type: bigint), (_col0 * 50) (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: bigint) sort order: ++ Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE TopN Hash Memory Usage: 0.04 value expressions: _col2 (type: int) Reducer 4 Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: int), KEY.reducesinkkey1 (type: bigint), VALUE._col0 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 100 Statistics: Num rows: 100 Data size: 800 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 100 Data size: 800 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Reducer 9 Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: int), KEY._col1 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 43554131773 Data size: 0 Basic stats: PARTIAL Column stats: NONE Reduce Output Operator key expressions: _col1 (type: int) sort order: + Map-reduce partition columns: _col1 (type: int) Statistics: Num rows: 43554131773 Data size: 0 Basic stats: PARTIAL Column stats: NONE value expressions: _col0 (type: int) Union 11 Vertex: Union 11 Stage: Stage-0 Fetch Operator limit: 100 Processor Tree: ListSink
In Map 14 Data size is 0
p 14 Map Operator Tree: TableScan alias: item filterExpr: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 4200 Data size: 781200 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: i_item_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 4200 Data size: 16800 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col2 (type: int) 1 _col0 (type: int) outputColumnNames: _col1 input vertices: 0 Union 11 Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE Reduce Output Operator key expressions: _col1 (type: int) sort order: + Map-reduce partition columns: _col1 (type: int) Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE Execution mode: vectorized
Attachments
Issue Links
- is related to
-
HIVE-9392 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
- Resolved