Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20570

Union ALL with hive.optimize.union.remove=true has incorrect plan

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0-alpha-1
    • None
    • None
    • Patch pushed to master branch.

    Description

      When hive.optimize.union.remove=true and a select query is run with group by, the final fetch is waiting only for one of the branches and not both.

      Test Case:

      create table if not exists test_table(column1 string, column2 int);
      insert into test_table values('a',1),('b',2);
      
      set hive.optimize.union.remove=true;
      set mapred.input.dir.recursive=true;
      
      explain
      select column1 from test_table group by column1
      union all
      select column1 from test_table group by column1;
      

      In the below the two stages correspond to the two parts of union all. But the final fetch operator (Stage 0) only depends on one of the stages, but it should depend on both.
      Plan:

      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-2 is a root stage
        *Stage-0 depends on stages: Stage-1*
      
      STAGE PLANS:
        Stage: Stage-1
          Map Reduce
            Map Operator Tree:
                TableScan
                  alias: test_table
                  Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: column1 (type: string)
                    outputColumnNames: column1
                    Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                    Group By Operator
                      keys: column1 (type: string)
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
            Execution mode: vectorized
            Reduce Operator Tree:
              Group By Operator
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-2
          Map Reduce
            Map Operator Tree:
                TableScan
                  alias: test_table
                  Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: column1 (type: string)
                    outputColumnNames: column1
                    Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                    Group By Operator
                      keys: column1 (type: string)
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column stats: NONE
            Execution mode: vectorized
            Reduce Operator Tree:
              Group By Operator
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

      Attachments

        1. HIVE-20570.1.patch
          8 kB
          Janaki Lahorani
        2. HIVE-20570.2.patch
          9 kB
          Janaki Lahorani
        3. HIVE-20570.3.patch
          10 kB
          Janaki Lahorani

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            janulatha Janaki Lahorani Assign to me
            janulatha Janaki Lahorani
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment