[MAPREDUCE-5767] Data corruption when single value exceeds map buffer size (io.sort.mb) - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.20.1
Fix Version/s: None
Component/s: mrv1
Labels:
None

Description

There is an issue in org.apache.hadoop.mapred.MapTask in 0.20 that can cause data corruption when the size of a single value produced by the mapper exceeds the size of the map output buffer (roughly io.sort.mb).

I experienced this issue in CDH4.2.1, but am logging the issue here for greater visibility in case anyone else might run across the issue.

The issue does not exist in 0.21 and beyond due to the implementation of ~~MAPREDUCE-64~~. That JIRA significantly changes the way the map output buffering is done and it looks like the issue has been resolved by those changes.

I expect this bug will likely be closed / won't fix due to the fact that 0.20 is obsolete. As stated previously, I am just logging this issue for visibility in case anyone else is still running something based on 0.20 and encounters the same problem.

In my situation the issue manifested as an ArrayIndexOutOfBoundsException in the reduce phase when deserializing a key – causing the job to fail. However, I think the problem could manifest in a more dangerous fashion where the affected job succeeds, but produces corrupt output. The stack trace I saw was:

2014-02-13 01:07:34,690 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.ArrayIndexOutOfBoundsException: 24
at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
at org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:86)
at org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:70)
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:135)
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:114)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:291)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:163)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)

The problem appears to me to be in org.apache.hadoop.mapred.MapTask.MapOutputBuffer.Buffer.write(byte[], int, int). The sequence of events that leads up to the issue is:

some complete records (cumulative size less than total buffer size) written to buffer
large (over io.sort.mb) record starts writing
soft buffer limit exceeded - spill starts
write of large record continues
buffer becomes full
wrap evaluates to true, suggesting the buffer can be safely wrapped
writing the large record continues until a write occurs such that bufindex + len == bufstart exactly. When this happens buffull evaluates to false, so the data gets written to the buffer without event
writing of the large value continues with another call to write(), starting the corruption of the buffer. Buffer full can no longer be detected by the buffull logic that is used when bufindex >= bufstart

The key to this problem occurring is a write where bufindex + len equals bufstart exactly.

I have titled the issue as having to do with writing large records (over io.sort.mb), but really I think the issue could occur on smaller records if the serializer generated a write of exactly the right size. For example, if the buffer is getting close to full, but hasn't exceeded the buffer soft limit and then a collect() on a new value is called that triggers a write() such that bufindex + len == bufstart. The size of the write would have to be relatively large – greater than the free space offered by the soft limit (20% of the buffer by default), making the issue occurring that way pretty unlikely.

Attachments

Issue Links

is depended upon by

AVRO-1786 Strange IndexOutofBoundException in GenericDatumReader.readString

Open

Data corruption when single value exceeds map buffer size (io.sort.mb)

Details

Description

Attachments

Issue Links

Activity

People

Dates