Description
For this query, the plan doesn't look correct:
OK STAGE DEPENDENCIES: Stage-4 is a root stage Stage-1 depends on stages: Stage-5, Stage-4 Stage-2 depends on stages: Stage-1 Stage-0 depends on stages: Stage-2 Stage-3 depends on stages: Stage-0 Stage-5 is a root stage STAGE PLANS: Stage: Stage-4 Spark DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:6 Vertices: Map 4 Map Operator Tree: TableScan alias: x Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE Spark HashTable Sink Operator condition expressions: 0 {_col1} 1 {value} keys: 0 _col0 (type: string) 1 key (type: string) Reduce Output Operator key expressions: key (type: string) sort order: + Map-reduce partition columns: key (type: string) Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE value expressions: value (type: string) Local Work: Map Reduce Local Work Stage: Stage-1 Spark Edges: Union 2 <- Map 1 (NONE, 0), Map 3 (NONE, 0) DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:4 Vertices: Map 1 Map Operator Tree: TableScan alias: x Filter Operator predicate: (key < 20) (type: boolean) Select Operator expressions: key (type: string), value (type: string) outputColumnNames: _col0, _col1 Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col1} 1 {key} {value} keys: 0 _col0 (type: string) 1 key (type: string) outputColumnNames: _col1, _col2, _col3 input vertices: 1 Map 4 Select Operator expressions: _col2 (type: string), _col3 (type: string), _col1 (type: string) outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.dest_j1 Local Work: Map Reduce Local Work Map 3 Map Operator Tree: TableScan alias: x1 Filter Operator predicate: (key > 100) (type: boolean) Select Operator expressions: key (type: string), value (type: string) outputColumnNames: _col0, _col1 Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col1} 1 {key} {value} keys: 0 _col0 (type: string) 1 key (type: string) outputColumnNames: _col1, _col2, _col3 input vertices: 1 Map 4 Select Operator expressions: _col2 (type: string), _col3 (type: string), _col1 (type: string) outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.dest_j1 Local Work: Map Reduce Local Work Union 2 Vertex: Union 2 Stage: Stage-2 Dependency Collection Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.dest_j1 Stage: Stage-3 Stats-Aggr Operator Stage: Stage-5 Spark DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:5 Vertices: Map 4 Map Operator Tree: TableScan alias: x Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE Spark HashTable Sink Operator condition expressions: 0 {_col1} 1 {value} keys: 0 _col0 (type: string) 1 key (type: string) Reduce Output Operator key expressions: key (type: string) sort order: + Map-reduce partition columns: key (type: string) Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE value expressions: value (type: string) Local Work: Map Reduce Local Work Time taken: 0.127 seconds, Fetched: 156 row(s)
Note that Stage-4 and Stage-5 are identical. Also, in Stage-4 there's a parallel RS operator with the HTS operator, which is strange.
Attachments
Attachments
Issue Links
- relates to
-
HIVE-9044 Union input to a join operator poses problem when converting to map join [Spark Branch]
- Open