Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1649

Compressed files with TextInputFormat does not work with CombineFileInputFormat

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 0.20.2
    • None
    • None
    • None

    Description

      CombineFileInputFormat creates splits based on blocks, regardless whether the underlying FileInputFormat is splittable or not..

      This means that we can have 2 or more splits for a compressed text file with TextInputFormat. For each of these splits, TextInputFormat.getRecordReader will return a RecordReader for the whole compressed file, thus causing duplicate input data.

      Attachments

        Issue Links

          Activity

            People

              zshao Zheng Shao
              zshao Zheng Shao
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: