[HIVE-20570] Union ALL with hive.optimize.union.remove=true has incorrect plan - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0.0-alpha-1
Component/s: None
Labels:
None

Release Note:
Patch pushed to master branch.

Description

When hive.optimize.union.remove=true and a select query is run with group by, the final fetch is waiting only for one of the branches and not both.

Test Case:

create table if not exists test_table(column1 string, column2 int);
insert into test_table values('a',1),('b',2);

set hive.optimize.union.remove=true;
set mapred.input.dir.recursive=true;

explain
select column1 from test_table group by column1
union all
select column1 from test_table group by column1;

In the below the two stages correspond to the two parts of union all. But the final fetch operator (Stage 0) only depends on one of the stages, but it should depend on both.
Plan:

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 is a root stage
  *Stage-0 depends on stages: Stage-1*

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test_table
            Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: column1 (type: string)
              outputColumnNames: column1
              Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                keys: column1 (type: string)
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: string)
                  sort order: +
                  Map-reduce partition columns: _col0 (type: string)
                  Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
      Execution mode: vectorized
      Reduce Operator Tree:
        Group By Operator
          keys: KEY._col0 (type: string)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test_table
            Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: column1 (type: string)
              outputColumnNames: column1
              Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                keys: column1 (type: string)
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: string)
                  sort order: +
                  Map-reduce partition columns: _col0 (type: string)
                  Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
      Execution mode: vectorized
      Reduce Operator Tree:
        Group By Operator
          keys: KEY._col0 (type: string)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-20570.1.patch
17/Sep/18 02:39
8 kB
Janaki Lahorani
HIVE-20570.2.patch
17/Sep/18 18:43
9 kB
Janaki Lahorani
HIVE-20570.3.patch
19/Sep/18 00:21
10 kB
Janaki Lahorani

Issue Links

links to

RB

Union ALL with hive.optimize.union.remove=true has incorrect plan

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates