Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.13.1, 0.14.0
-
None
-
Amazon Elastic Map Reduce, AMI 3.3.1, Hadoop Amazon 2.4.0, Hive 0.13.1
Description
It looks like the query below returns incorrect results on Hive 0.13.1, but it was working fine on Hive 0.11.
I have the following table:
CREATE TABLE `t`(
`category` int,
`live` int,
`comments` int)
with the following data:
hive> select * from t;
OK
3 0 2
2 0 2
8 0 2
The query:
hive> select category, max(live) live, max(comments) comments, rank() OVER (PARTITION BY category ORDER BY comments) rank1
FROM t
GROUP BY category
GROUPING SETS ((), (category))
HAVING max(comments) > 0;
return the following results:
NULL 1 48 1
2 1 49 1
3 1 49 1
8 1 49 1
When using grouping sets with the rank() function the max() function return incorrect results. Everything works fine if I remove grouping sets clause and split the query into two independent queries or remove the rank() function.
This looks like a bug to me but please review. That said, I'm not sure if it's just Amazon issue or general Hive issue.
Attachments
Attachments
Issue Links
- links to