Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-15

SequenceFile RecordReader should skip bad records

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Currently a bad record in a sequencefile leads to entire job being failed. the best workaround is to skip an errant file manually (by looking at what map task failed). This is a sucky option because it's manual and because one should be able to skip a sequencefile block (instead of entire file).

      While we don't see this often (and i don't know why this corruption happened) - here's an example stack:
      Status : FAILED java.lang.NegativeArraySizeException
      at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:96)
      at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:75)
      at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:130)
      at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1640)
      at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1712)
      at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
      at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:176)

      Ideally the recordreader should just skip the entire chunk if it gets an unrecoverable error while reading.

      This was the consensus in hadoop-153 as well (that data corruptions should be handled by recordreaders) and hadoop-3144 did something similar for textinputformat.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jsensarma Joydeep Sen Sarma
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: