Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.2.0
-
None
Description
When hive.groupby.skewindata=true, the following query based on TPC-H gives wrong results:
set hive.groupby.skewindata=true;
select l_returnflag, count(*), count(distinct l_linestatus)
from lineitem
group by l_returnflag
limit 10;
The query plan shows that it generates only one MapReduce job instead of two theoretically, which is dictated by hive.groupby.skewindata=true.
The problem arises only when
count(*)
and
count(distinct)
exist together.
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-7261 Calculation works wrong when hive.groupby.skewindata is true and count(*) count(distinct) group by work simultaneously
-
- Resolved
-