Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
- join between 2 tables; the larger is partitioned
- mapjoin is selected
- dpp is sending events from the small table to the large on
- rollup: summary row is missing if dpp removes all input partitions
the following should have a 1 row result.
set hive.auto.convert.join=true; drop table if exists store_sales_s0; drop table if exists store_s0; CREATE TABLE store_sales_s0 (ss_item_sk int,payload string,payload2 string,payload3 string) PARTITIONED BY (ss_store_sk int) stored as orc TBLPROPERTIES( 'transactional'='false'); CREATE TABLE store_s0 (s_item_sk int,s_store_sk int,s_state string) stored as orc TBLPROPERTIES( 'transactional'='false'); insert into store_s0 values (1,10,'XX'), (2,20,'AA'), (3,30,'ZZ') ; insert into store_sales_s0 partition(ss_store_sk=9) values (1,'xxx','xxx','xxx'),(2,'xxx','xxx','xxx'),(3,'xxx','xxx','xxx'),(4,'xxx','xxx','xxx'),(5,'xxx','xxx','xxx'); insert into store_sales_s0 partition(ss_store_sk=39) values (1,'xxx','xxx','xxx'),(2,'xxx','xxx','xxx'),(3,'xxx','xxx','xxx'),(4,'xxx','xxx','xxx'),(5,'xxx','xxx','xxx'); explain select grouping(s_state) from store_s0, store_sales_s0 where ss_store_sk = s_store_sk and s_state in ('SD','FL', 'MI', 'LA', 'MO', 'SC') group by rollup(ss_item_sk, s_state) order by s_state; select grouping(s_state) from store_s0, store_sales_s0 where ss_store_sk = s_store_sk and s_state in ('SD','FL', 'MI', 'LA', 'MO', 'SC') group by rollup(ss_item_sk, s_state) order by s_state;
explain:
STAGE PLANS: Stage: Stage-1 Tez #### A masked pattern was here #### Edges: Map 2 <- Map 1 (BROADCAST_EDGE) [...] #### A masked pattern was here #### Vertices: Map 1 Map Operator Tree: [...] Dynamic Partitioning Event Operator Target column: ss_store_sk (int) Target Input: store_sales_s0 Partition key expr: ss_store_sk Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Target Vertex: Map 2 Execution mode: vectorized, llap LLAP IO: all inputs [...] Map 2 Map Operator Tree: TableScan alias: store_sales_s0 filterExpr: ss_store_sk is not null (type: boolean) Statistics: Num rows: 10 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ss_item_sk (type: int), ss_store_sk (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0 (type: int) 1 _col1 (type: int) outputColumnNames: _col1, _col2 input vertices: 0 Map 1 Statistics: Num rows: 10 Data size: 900 Basic stats: COMPLETE Column stats: COMPLETE [...] Execution mode: vectorized, llap LLAP IO: all inputs [...]