[HIVE-9025] join38.q (without map join) produces incorrect result when testing with multiple reducers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.14.0
Fix Version/s: 1.0.0
Component/s: Logical Optimizer
Labels:
None

Description

I have this query from a modified version of join38.q, which does NOT use map join:

FROM src a JOIN tmp b ON (a.key = b.col11)
SELECT a.value, b.col5, count(1) as count
where b.col11 = 111
group by a.value, b.col5;

If I set mapred.reduce.tasks to 1, the result is correct. But, if I set it to be a larger number (3 for instance), then result will be

val_111	105	1

which is wrong.

I think the issue is that, for this case, ConstantPropagationProcFactory will overwrite the partition cols for the reduce sink desc, with an empty list. Then, later on in ReduceSinkOperator#computeHashCode, since partitionEval is length 0, it will use an random number as hashcode, for each separate row. As result, rows with same key will be distributed to different reducers, and hence leads to incorrect result.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-9025.patch
07/Dec/14 09:17
22 kB
Ted Xu
HIVE-9025.1.patch
11/Dec/14 01:37
40 kB
Ted Xu

Issue Links

blocks

HIVE-9026 Re-enable remaining tests after HIVE-8970 [Spark Branch]

Resolved

is broken by

HIVE-5771 Constant propagation optimizer for Hive

Closed

links to

RB request

Activity

People

Assignee:: Ted Xu

Reporter:: Chao Sun

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 05/Dec/14 00:36

Updated:: 19/Feb/15 18:21

Resolved:: 11/Dec/14 18:42