Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.11.0, 0.12.0, 0.13.0, 0.14.0
-
None
Description
For example, we have a simple query like this ...
SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
The plan of it is ...
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: x TableScan alias: x Reduce Output Operator key expressions: expr: a type: int expr: a type: int sort order: ++ Map-reduce partition columns: expr: a type: int tag: -1 value expressions: expr: a type: int expr: b type: string Reduce Operator Tree: Extract PTF Operator Select Operator expressions: expr: _col0 type: int expr: _col1 type: string expr: _wcol0 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1
The ReduceSinkOperator has two "a" in its key columns. This redundancy can increase the size of map output.
Attachments
Attachments
Issue Links
- links to