[HADOOP-1014] map/reduce is corrupting data between map and reduce - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.11.1
Fix Version/s: 0.11.2
Component/s: None
Labels:
None

Description

It appears that a random data corruption is happening between the map and the reduce. This looks to be a blocker until it is resolved. There were two relevant messages on hadoop-dev:

from Mike Smith:

The map/reduce jobs are not consistent in hadoop 0.11 release and trunk both
when you rerun the same job. I have observed this inconsistency of the map
output in different jobs. A simple test to double check is to use hadoop
0.11 with nutch trunk.

from Albert Chern:

I am having the same problem with my own map reduce jobs. I have a job
which requires two pieces of data per key, and just as a sanity check I make
sure that it gets both in the reducer, but sometimes it doesn't. What's
even stranger is, the same tasks that complain about missing key/value pairs
will maybe fail two or three times, but then succeed on a subsequent try,
which leads me to believe that the bug has to do with randomization (I'm not
sure, but I think the map outputs are shuffled?).

All of my code works perfectly with 0.9, so I went back and just compared
the sizes of the outputs. For some jobs, the outputs from 0.11 were
consistently 4 bytes larger, probably due to changes in SequenceFile. But
for others, the output sizes were all over the place. Some partitions were
empty, some were correct, and some were missing data. There seems to be
something seriously wrong with 0.11, so I suggest you use 0.9. I've been
trying to pinpoint the bug but its random nature is really annoying.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TestMapRed.java
15/Feb/07 11:02
11 kB
Devaraj Das
TestMapRed.patch
14/Feb/07 19:51
34 kB
Riccardo Boscolo
TestMapRed2.patch
14/Feb/07 20:58
35 kB
Riccardo Boscolo
zero-size-inmem-fs.patch
15/Feb/07 11:02
2 kB
Devaraj Das

Issue Links

incorporates

HADOOP-333 we should have some checks that the sort benchmark generates correct outputs

Closed

relates to

HADOOP-1027 Fix the RAM FileSystem/Merge problems (reported in HADOOP-1014)

Closed

Activity

People

Assignee:: Devaraj Das

Reporter:: Owen O'Malley

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 13/Feb/07 05:02

Updated:: 08/Jul/09 16:52

Resolved:: 16/Feb/07 22:34