[HIVE-5237] Incorrect group-by aggregation in 0.11.0 - ASF JIRA

XML

Word

Printable

JSON

group by with sub queries does not correctly aggregate results in Hive 0.11.0.

To reproduce:

Put the file

1,b
2,c
2,b
3,a
3,c
4,a

in HDFS, and run

create external table abc (x int, y string) row format delimited fields terminated by ',' location '/data/';

The query

select
        x,
        count(*)
from
(select
        x,
        y
from
        abc
group by
      x,
      y
) a
group by
        x;

will then give the result

instead of the correct

In 0.9.0 and 0.10.0 this is all working correctly.

duplicates

HIVE-5149 ReduceSinkDeDuplication can pick the wrong partitioning columns