Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Patch pushed to master branch.
Description
When hive.optimize.union.remove=true and a select query is run with group by, the final fetch is waiting only for one of the branches and not both.
Test Case:
create table if not exists test_table(column1 string, column2 int); insert into test_table values('a',1),('b',2); set hive.optimize.union.remove=true; set mapred.input.dir.recursive=true; explain select column1 from test_table group by column1 union all select column1 from test_table group by column1;
In the below the two stages correspond to the two parts of union all. But the final fetch operator (Stage 0) only depends on one of the stages, but it should depend on both.
Plan:
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 is a root stage *Stage-0 depends on stages: Stage-1* STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: test_table Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: column1 (type: string) outputColumnNames: column1 Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: column1 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE Execution mode: vectorized Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-2 Map Reduce Map Operator Tree: TableScan alias: test_table Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: column1 (type: string) outputColumnNames: column1 Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: column1 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE Execution mode: vectorized Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink
Attachments
Attachments
Issue Links
- links to