Description
Reproducer
set hive.cbo.returnpath.hiveop=true set hive.map.aggr=false create table abcd (a int, b int, c int, d int); LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
explain select count(distinct a) from abcd group by b;
STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: abcd Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: a (type: int) outputColumnNames: a Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: a (type: int), a (type: int) sort order: ++ Map-reduce partition columns: a (type: int) Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Group By Operator aggregations: count(DISTINCT KEY._col1:0._col0) keys: KEY._col0 (type: int) mode: complete outputColumnNames: b, $f1 Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: $f1 (type: bigint) outputColumnNames: _o__c0 Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
explain select count(distinct a) from abcd group by c;
STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: abcd Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: a (type: int) outputColumnNames: a Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: a (type: int), a (type: int) sort order: ++ Map-reduce partition columns: a (type: int) Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Group By Operator aggregations: count(DISTINCT KEY._col1:0._col0) keys: KEY._col0 (type: int) mode: complete outputColumnNames: c, $f1 Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: $f1 (type: bigint) outputColumnNames: _o__c0 Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Above two cases has wrong keys in Map side Reduce Output Operator (both has a, a instead of b,a and c,a respectively