Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5076

CombineFileInputFormat can create splits that exceed maxSplitSize

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      I ran a local job with CombineFileInputFormat using an 80 MB file and a max split size of 32 MB (the default local FS block size). The job ran with two splits of 32 MB, and the last 16 MB were just omitted.

      This appears to be caused by a subtle bug in getMoreSplits, in which the code that generates the splits from the blocks expects the 16 MB block to be at the end of the block list. But the code that generates the blocks does not respect this.

      Attachments

        Activity

          People

            sandyr Sandy Ryza
            sandyr Sandy Ryza
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: