Affects Version/s: 0.20.1
Fix Version/s: None
There is an issue in org.apache.hadoop.mapred.MapTask in 0.20 that can cause data corruption when the size of a single value produced by the mapper exceeds the size of the map output buffer (roughly io.sort.mb).
I experienced this issue in CDH4.2.1, but am logging the issue here for greater visibility in case anyone else might run across the issue.
The issue does not exist in 0.21 and beyond due to the implementation of
MAPREDUCE-64. That JIRA significantly changes the way the map output buffering is done and it looks like the issue has been resolved by those changes.
I expect this bug will likely be closed / won't fix due to the fact that 0.20 is obsolete. As stated previously, I am just logging this issue for visibility in case anyone else is still running something based on 0.20 and encounters the same problem.
In my situation the issue manifested as an ArrayIndexOutOfBoundsException in the reduce phase when deserializing a key – causing the job to fail. However, I think the problem could manifest in a more dangerous fashion where the affected job succeeds, but produces corrupt output. The stack trace I saw was:
2014-02-13 01:07:34,690 WARN org.apache.hadoop.mapred.Child: Error running child
at java.security.AccessController.doPrivileged(Native Method)
The problem appears to me to be in org.apache.hadoop.mapred.MapTask.MapOutputBuffer.Buffer.write(byte, int, int). The sequence of events that leads up to the issue is:
- some complete records (cumulative size less than total buffer size) written to buffer
- large (over io.sort.mb) record starts writing
- soft buffer limit exceeded - spill starts
- write of large record continues
- buffer becomes full
- wrap evaluates to true, suggesting the buffer can be safely wrapped
- writing the large record continues until a write occurs such that bufindex + len == bufstart exactly. When this happens buffull evaluates to false, so the data gets written to the buffer without event
- writing of the large value continues with another call to write(), starting the corruption of the buffer. Buffer full can no longer be detected by the buffull logic that is used when bufindex >= bufstart
The key to this problem occurring is a write where bufindex + len equals bufstart exactly.
I have titled the issue as having to do with writing large records (over io.sort.mb), but really I think the issue could occur on smaller records if the serializer generated a write of exactly the right size. For example, if the buffer is getting close to full, but hasn't exceeded the buffer soft limit and then a collect() on a new value is called that triggers a write() such that bufindex + len == bufstart. The size of the write would have to be relatively large – greater than the free space offered by the soft limit (20% of the buffer by default), making the issue occurring that way pretty unlikely.