[MAPREDUCE-15] SequenceFile RecordReader should skip bad records - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Reopened
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Currently a bad record in a sequencefile leads to entire job being failed. the best workaround is to skip an errant file manually (by looking at what map task failed). This is a sucky option because it's manual and because one should be able to skip a sequencefile block (instead of entire file).

While we don't see this often (and i don't know why this corruption happened) - here's an example stack:
Status : FAILED java.lang.NegativeArraySizeException
at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:96)
at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:75)
at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:130)
at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1640)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1712)
at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:176)

Ideally the recordreader should just skip the entire chunk if it gets an unrecoverable error while reading.

This was the consensus in hadoop-153 as well (that data corruptions should be handled by recordreaders) and hadoop-3144 did something similar for textinputformat.

Attachments

Issue Links

relates to

MAPREDUCE-21 NegativeArraySizeException in reducer with new api

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Joydeep Sen Sarma

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 30/Jun/08 01:32

Updated:: 06/Nov/15 15:17