Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
0.11.0
-
None
-
None
-
None
Description
group by with sub queries does not correctly aggregate results in Hive 0.11.0.
To reproduce:
Put the file
1,b 2,c 2,b 3,a 3,c 4,a
in HDFS, and run
create external table abc (x int, y string) row format delimited fields terminated by ',' location '/data/';
The query
select x, count(*) from (select x, y from abc group by x, y ) a group by x;
will then give the result
2 1 3 1 2 1 4 1 3 1 1 1
instead of the correct
1 1 2 2 3 2 4 1
In 0.9.0 and 0.10.0 this is all working correctly.
Attachments
Issue Links
- duplicates
-
HIVE-5149 ReduceSinkDeDuplication can pick the wrong partitioning columns
- Closed