Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1597

combinefileinputformat does not work with non-splittable files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.22.0
    • None
    • None
    • Reviewed

    Description

      CombineFileInputFormat.getSplits() does not take into account whether a file is splittable.
      This can lead to a problem for compressed text files - for example, getSplits() may return more
      than 1 split depending on the size of the compressed file, all the splits recordreader will read the
      complete file.

      I ran into this problem while using Hive on hadoop 20.

      Attachments

        1. patch-1597-ydist.txt
          26 kB
          Amareshwari Sriramadasu
        2. patch-1597.txt
          28 kB
          Amareshwari Sriramadasu

        Issue Links

          Activity

            People

              amareshwari Amareshwari Sriramadasu
              namit Namit Jain
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: