[HIVE-7045] Wrong results in multi-table insert aggregating without group by clause - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.10.0, 0.12.0
Fix Version/s: 0.14.0
Component/s: None
Labels:
None

Description

This happens whenever there are more than 1 reducers.

The scenario :

CREATE TABLE t1 (a int, b int);
CREATE TABLE t2 (cnt int) PARTITIONED BY (var_name string);

insert into table t1 select 1,1 from asd limit 1;
insert into table t1 select 2,2 from asd limit 1;

t1 contains :
1 1
2 2

from t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt
insert overwrite table t2 partition(var_name='b') select count(b) cnt ;

select * from t2;
returns :
2 a
2 b

as expected.

Setting the number of reducers higher than 1 :

set mapred.reduce.tasks=2;

from t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt
insert overwrite table t2 partition(var_name='b') select count(b) cnt;

select * from t2;
1 a
1 a
1 b
1 b

Wrong results.

This happens when ever t1 is big enough to automatically generate more than 1 reducers and without specifying it directly.

adding "group by 1" in the end of each insert solves the problem :

from t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1
insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 1;

generates :
2 a
2 b

This should work without the group by...
The number of rows for each partition will be the amount of reducers.
Each reducer calculated a sub total of the count.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-7045.1.patch.txt
04/Jul/14 06:36
1 kB
Navis Ryu

Issue Links

links to

review board

Activity

People

Assignee:: Navis Ryu

Reporter:: dima machlin

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/May/14 14:08

Updated:: 13/Nov/14 19:43

Resolved:: 07/Jul/14 22:07