Hive
  1. Hive
  2. HIVE-3331

plan for union all followed by mapjoin may be wrong

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.8.1
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      test query:

      explain create table union_mapjoin as select * from (select /*+ MAPJOIN(a) */ a.key keya, a.value valuea, b2.key b2key, b2.value b2value
          from
          join_src a join
          (select key, value from join_src where key = 11 limit 1 
              union all 
              select key, value from join_src where key = 22 limit 1 
          ) b2 
          on a.key = b2.key)sub limit 10000;
      

      I got result:

      
      STAGE DEPENDENCIES:
        Stage-4 is a root stage
        Stage-6 is a root stage
        Stage-7 is a root stage
        Stage-1 depends on stages: Stage-7
        Stage-2 depends on stages: Stage-1
        Stage-0 depends on stages: Stage-2
        Stage-8 depends on stages: Stage-0
        Stage-3 depends on stages: Stage-8
      
      STAGE PLANS:
        Stage: Stage-4
          Map Reduce
            Alias -> Map Operator Tree:
              sub-subquery1:b2-subquery1:join_src 
                TableScan
                  alias: join_src
                  Filter Operator
                    predicate:
                        expr: (key = 11)
                        type: boolean
                    Select Operator
                      expressions:
                            expr: key
                            type: string
                            expr: value
                            type: string
                      outputColumnNames: _col0, _col1
                      Limit
                        Reduce Output Operator
                          sort order: 
                          tag: -1
                          value expressions:
                                expr: _col0
                                type: string
                                expr: _col1
                                type: string
            Reduce Operator Tree:
              Extract
                Limit
                  File Output Operator
                    compressed: false
                    GlobalTableId: 0
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
      
        Stage: Stage-6
          Map Reduce
            Alias -> Map Operator Tree:
              sub-subquery2:b2-subquery2:join_src 
                TableScan
                  alias: join_src
                  Filter Operator
                    predicate:
                        expr: (key = 22)
                        type: boolean
                    Select Operator
                      expressions:
                            expr: key
                            type: string
                            expr: value
                            type: string
                      outputColumnNames: _col0, _col1
                      Limit
                        Reduce Output Operator
                          sort order: 
                          tag: -1
                          value expressions:
                                expr: _col0
                                type: string
                                expr: _col1
                                type: string
            Reduce Operator Tree:
              Extract
                Limit
                  File Output Operator
                    compressed: false
                    GlobalTableId: 0
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
      
        Stage: Stage-7
          Map Reduce Local Work
            Alias -> Map Local Tables:
              sub:a 
                Fetch Operator
                  limit: -1
            Alias -> Map Local Operator Tree:
              sub:a 
                TableScan
                  alias: a
                  HashTable Sink Operator
                    condition expressions:
                      0 {key} {value}
                      1 {_col0} {_col1}
                    handleSkewJoin: false
                    keys:
                      0 [Column[key]]
                      1 [Column[_col0]]
                    Position of Big Table: 1
      
        Stage: Stage-1
          Map Reduce
            Local Work:
              Map Reduce Local Work
      
        Stage: Stage-2
          Map Reduce
            Alias -> Map Operator Tree:
              file:/tmp/hive-xinyu/hive_2012-08-03_17-07-18_694_3666415239747373912/-mr-10002 
                Select Operator
                  expressions:
                        expr: _col0
                        type: string
                        expr: _col1
                        type: string
                        expr: _col4
                        type: string
                        expr: _col5
                        type: string
                  outputColumnNames: _col0, _col1, _col4, _col5
                  Select Operator
                    expressions:
                          expr: _col0
                          type: string
                          expr: _col1
                          type: string
                          expr: _col4
                          type: string
                          expr: _col5
                          type: string
                    outputColumnNames: _col0, _col1, _col2, _col3
                    Select Operator
                      expressions:
                            expr: _col0
                            type: string
                            expr: _col1
                            type: string
                            expr: _col2
                            type: string
                            expr: _col3
                            type: string
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Limit
                        Reduce Output Operator
                          sort order: 
                          tag: -1
                          value expressions:
                                expr: _col0
                                type: string
                                expr: _col1
                                type: string
                                expr: _col2
                                type: string
                                expr: _col3
                                type: string
            Reduce Operator Tree:
              Extract
                Limit
                  File Output Operator
                    compressed: false
                    GlobalTableId: 1
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        name: default.union_mapjoin
      
        Stage: Stage-0
          Move Operator
            <...>
        Stage: Stage-8
            Create Table Operator:
            <...>
      

      what's wrong with Stage-1?

      In code (about line 362):

      ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinFactory.java
      
            // If the plan for this reducer does not exist, initialize the plan                                                                                   
            if (opMapTask == null) {
              assert currPlan.getReducer() == null;
              ctx.setCurrMapJoinOp(mapJoin);
              GenMapRedUtils.initMapJoinPlan(mapJoin, ctx, true, true, false, pos);
              ctx.setCurrUnionOp(null);
            } else {
              // The current plan can be thrown away after being merged with the                                                                                  
              // original plan
              // Does here lose: ctx.setCurrMapJoinOp(mapJoin); ?                                                                                                                     
              Task<? extends Serializable> uTask = ctx.getUnionTask(
                  ctx.getCurrUnionOp()).getUTask();
              if (uTask.getId().equals(opMapTask.getId())) {
                GenMapRedUtils.joinPlan(mapJoin, null, opMapTask, ctx, pos, false,
                    false, true);
              } else {
                GenMapRedUtils.joinPlan(mapJoin, uTask, opMapTask, ctx, pos, false,
                    false, true);
              }
              currTask = opMapTask;
              ctx.setCurrTask(currTask);
            }
      

      Does the "else" block forget to set "setCurrMapJoinOp"? I hack the code as so, and get the plan what I want.

      plan after my modification:

      STAGE DEPENDENCIES:
        Stage-4 is a root stage
        Stage-7 depends on stages: Stage-4, Stage-6
        Stage-1 depends on stages: Stage-7
        Stage-2 depends on stages: Stage-1
        Stage-0 depends on stages: Stage-2
        Stage-8 depends on stages: Stage-0
        Stage-3 depends on stages: Stage-8
        Stage-6 is a root stage
      
      STAGE PLANS:
        Stage: Stage-4
          Map Reduce
            Alias -> Map Operator Tree:
              sub-subquery1:b2-subquery1:join_src 
                TableScan
                  alias: join_src
                  Filter Operator
                    predicate:
                        expr: (key = 11)
                        type: boolean
                    Select Operator
                      expressions:
                            expr: key
                            type: string
                            expr: value
                            type: string
                      outputColumnNames: _col0, _col1
                      Limit
                        Reduce Output Operator
                          sort order: 
                          tag: -1
                          value expressions:
                                expr: _col0
                                type: string
                                expr: _col1
                                type: string
            Reduce Operator Tree:
              Extract
                Limit
                  File Output Operator
                    compressed: false
                    GlobalTableId: 0
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
      
        Stage: Stage-7
          Map Reduce Local Work
            Alias -> Map Local Tables:
              sub:a 
                Fetch Operator
                  limit: -1
            Alias -> Map Local Operator Tree:
              sub:a 
                TableScan
                  alias: a
                  HashTable Sink Operator
                    condition expressions:
                      0 {key} {value}
                      1 {_col0} {_col1}
                    handleSkewJoin: false
                    keys:
                      0 [Column[key]]
                      1 [Column[_col0]]
                    Position of Big Table: 1
      
        Stage: Stage-1
          Map Reduce
            Alias -> Map Operator Tree:
              file:/tmp/hive-xinyu/hive_2012-08-03_17-28-19_000_5428590854627305209/-mr-10003 
                Union
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    condition expressions:
                      0 {key} {value}
                      1 {_col0} {_col1}
                    handleSkewJoin: false
                    keys:
                      0 [Column[key]]
                      1 [Column[_col0]]
                    outputColumnNames: _col0, _col1, _col4, _col5
                    Position of Big Table: 1
                    File Output Operator
                      compressed: false
                      GlobalTableId: 0
                      table:
                          input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
              file:/tmp/hive-xinyu/hive_2012-08-03_17-28-19_000_5428590854627305209/-mr-10004 
                Union
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    condition expressions:
                      0 {key} {value}
                      1 {_col0} {_col1}
                    handleSkewJoin: false
                    keys:
                      0 [Column[key]]
                      1 [Column[_col0]]
                    outputColumnNames: _col0, _col1, _col4, _col5
                    Position of Big Table: 1
                    File Output Operator
                      compressed: false
                      GlobalTableId: 0
                      table:
                          input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
            Local Work:
              Map Reduce Local Work
      
        Stage: Stage-2
          Map Reduce
            Alias -> Map Operator Tree:
              file:/tmp/hive-xinyu/hive_2012-08-03_17-28-19_000_5428590854627305209/-mr-10002 
                Select Operator
                  expressions:
                        expr: _col0
                        type: string
                        expr: _col1
                        type: string
                        expr: _col4
                        type: string
                        expr: _col5
                        type: string
                  outputColumnNames: _col0, _col1, _col4, _col5
                  Select Operator
                    expressions:
                          expr: _col0
                          type: string
                          expr: _col1
                          type: string
                          expr: _col4
                          type: string
                          expr: _col5
                          type: string
                    outputColumnNames: _col0, _col1, _col2, _col3
                    Select Operator
                      expressions:
                            expr: _col0
                            type: string
                            expr: _col1
                            type: string
                            expr: _col2
                            type: string
                            expr: _col3
                            type: string
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Limit
                        Reduce Output Operator
                          sort order: 
                          tag: -1
                          value expressions:
                                expr: _col0
                                type: string
                                expr: _col1
                                type: string
                                expr: _col2
                                type: string
                                expr: _col3
                                type: string
            Reduce Operator Tree:
              Extract
                Limit
                  File Output Operator
                    compressed: false
                    GlobalTableId: 1
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        name: default.union_mapjoin
      
        Stage: Stage-0
          Move Operator
            <...>
      
        Stage: Stage-8
            Create Table Operator:
            <...>
      
        Stage: Stage-3
          Stats-Aggr Operator
      
        Stage: Stage-6
          Map Reduce
            Alias -> Map Operator Tree:
              sub-subquery2:b2-subquery2:join_src 
                TableScan
                  alias: join_src
                  Filter Operator
                    predicate:
                        expr: (key = 22)
                        type: boolean
                    Select Operator
                      expressions:
                            expr: key
                            type: string
                            expr: value
                            type: string
                      outputColumnNames: _col0, _col1
                      Limit
                        Reduce Output Operator
                          sort order: 
                          tag: -1
                          value expressions:
                                expr: _col0
                                type: string
                                expr: _col1
                                type: string
            Reduce Operator Tree:
              Extract
                Limit
                  File Output Operator
                    compressed: false
                    GlobalTableId: 0
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
      
      

        Activity

        Zhang Xinyu created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Zhang Xinyu
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development