Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6216

Seeking backwards in MapFiles does not always correctly sync the underlying SequenceFile, resulting in "File is corrupt" exceptions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 2.4.1
    • None
    • None

    Description

      In some occasions, when reading MapFiles which were generated by MapFileOutputFormat with BZIP2 BLOCK compression, using getClosest(key, value, true) on the MapFile reader causes an IOException to be thrown with the message "File is corrupt!" When doing "hdfs fsck", it shows that everything is OK, and the underlying data and index files can also be read correctly if read with a SequenceFile.Reader.

      The exception happens in the readBlock() method of the SequenceFile.Reader class.

      My guess is that, since MapFile.Reader's seekInternal() method does "seek()" instead of "sync()", it is not correctly checked if the cursor is really positioned at a valid location.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rabejens Jens Rabe
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: