Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.8.1
-
None
-
None
Description
test query:
explain create table union_mapjoin as select * from (select /*+ MAPJOIN(a) */ a.key keya, a.value valuea, b2.key b2key, b2.value b2value
from
join_src a join
(select key, value from join_src where key = 11 limit 1
union all
select key, value from join_src where key = 22 limit 1
) b2
on a.key = b2.key)sub limit 10000;
I got result:
STAGE DEPENDENCIES: Stage-4 is a root stage Stage-6 is a root stage Stage-7 is a root stage Stage-1 depends on stages: Stage-7 Stage-2 depends on stages: Stage-1 Stage-0 depends on stages: Stage-2 Stage-8 depends on stages: Stage-0 Stage-3 depends on stages: Stage-8 STAGE PLANS: Stage: Stage-4 Map Reduce Alias -> Map Operator Tree: sub-subquery1:b2-subquery1:join_src TableScan alias: join_src Filter Operator predicate: expr: (key = 11) type: boolean Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 Limit Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: string expr: _col1 type: string Reduce Operator Tree: Extract Limit File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-6 Map Reduce Alias -> Map Operator Tree: sub-subquery2:b2-subquery2:join_src TableScan alias: join_src Filter Operator predicate: expr: (key = 22) type: boolean Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 Limit Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: string expr: _col1 type: string Reduce Operator Tree: Extract Limit File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-7 Map Reduce Local Work Alias -> Map Local Tables: sub:a Fetch Operator limit: -1 Alias -> Map Local Operator Tree: sub:a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 {_col0} {_col1} handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[_col0]] Position of Big Table: 1 Stage: Stage-1 Map Reduce Local Work: Map Reduce Local Work Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: file:/tmp/hive-xinyu/hive_2012-08-03_17-07-18_694_3666415239747373912/-mr-10002 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col4 type: string expr: _col5 type: string outputColumnNames: _col0, _col1, _col4, _col5 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col4 type: string expr: _col5 type: string outputColumnNames: _col0, _col1, _col2, _col3 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: string expr: _col3 type: string outputColumnNames: _col0, _col1, _col2, _col3 Limit Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: string expr: _col3 type: string Reduce Operator Tree: Extract Limit File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat name: default.union_mapjoin Stage: Stage-0 Move Operator <...> Stage: Stage-8 Create Table Operator: <...>
what's wrong with Stage-1?
In code (about line 362):
ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinFactory.java
// If the plan for this reducer does not exist, initialize the plan if (opMapTask == null) { assert currPlan.getReducer() == null; ctx.setCurrMapJoinOp(mapJoin); GenMapRedUtils.initMapJoinPlan(mapJoin, ctx, true, true, false, pos); ctx.setCurrUnionOp(null); } else { // The current plan can be thrown away after being merged with the // original plan // Does here lose: ctx.setCurrMapJoinOp(mapJoin); ? Task<? extends Serializable> uTask = ctx.getUnionTask( ctx.getCurrUnionOp()).getUTask(); if (uTask.getId().equals(opMapTask.getId())) { GenMapRedUtils.joinPlan(mapJoin, null, opMapTask, ctx, pos, false, false, true); } else { GenMapRedUtils.joinPlan(mapJoin, uTask, opMapTask, ctx, pos, false, false, true); } currTask = opMapTask; ctx.setCurrTask(currTask); }
Does the "else" block forget to set "setCurrMapJoinOp"? I hack the code as so, and get the plan what I want.
plan after my modification:
STAGE DEPENDENCIES: Stage-4 is a root stage Stage-7 depends on stages: Stage-4, Stage-6 Stage-1 depends on stages: Stage-7 Stage-2 depends on stages: Stage-1 Stage-0 depends on stages: Stage-2 Stage-8 depends on stages: Stage-0 Stage-3 depends on stages: Stage-8 Stage-6 is a root stage STAGE PLANS: Stage: Stage-4 Map Reduce Alias -> Map Operator Tree: sub-subquery1:b2-subquery1:join_src TableScan alias: join_src Filter Operator predicate: expr: (key = 11) type: boolean Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 Limit Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: string expr: _col1 type: string Reduce Operator Tree: Extract Limit File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-7 Map Reduce Local Work Alias -> Map Local Tables: sub:a Fetch Operator limit: -1 Alias -> Map Local Operator Tree: sub:a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 {_col0} {_col1} handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[_col0]] Position of Big Table: 1 Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: file:/tmp/hive-xinyu/hive_2012-08-03_17-28-19_000_5428590854627305209/-mr-10003 Union Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 {_col0} {_col1} handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[_col0]] outputColumnNames: _col0, _col1, _col4, _col5 Position of Big Table: 1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat file:/tmp/hive-xinyu/hive_2012-08-03_17-28-19_000_5428590854627305209/-mr-10004 Union Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 {_col0} {_col1} handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[_col0]] outputColumnNames: _col0, _col1, _col4, _col5 Position of Big Table: 1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Local Work: Map Reduce Local Work Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: file:/tmp/hive-xinyu/hive_2012-08-03_17-28-19_000_5428590854627305209/-mr-10002 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col4 type: string expr: _col5 type: string outputColumnNames: _col0, _col1, _col4, _col5 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col4 type: string expr: _col5 type: string outputColumnNames: _col0, _col1, _col2, _col3 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: string expr: _col3 type: string outputColumnNames: _col0, _col1, _col2, _col3 Limit Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: string expr: _col3 type: string Reduce Operator Tree: Extract Limit File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat name: default.union_mapjoin Stage: Stage-0 Move Operator <...> Stage: Stage-8 Create Table Operator: <...> Stage: Stage-3 Stats-Aggr Operator Stage: Stage-6 Map Reduce Alias -> Map Operator Tree: sub-subquery2:b2-subquery2:join_src TableScan alias: join_src Filter Operator predicate: expr: (key = 22) type: boolean Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: _col0, _col1 Limit Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: string expr: _col1 type: string Reduce Operator Tree: Extract Limit File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat