Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1597

combinefileinputformat does not work with non-splittable files

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      CombineFileInputFormat.getSplits() does not take into account whether a file is splittable.
      This can lead to a problem for compressed text files - for example, getSplits() may return more
      than 1 split depending on the size of the compressed file, all the splits recordreader will read the
      complete file.

      I ran into this problem while using Hive on hadoop 20.

      1. patch-1597.txt
        28 kB
        Amareshwari Sriramadasu
      2. patch-1597-ydist.txt
        26 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch for Yahoo! distribution, on top of MAPREDUCE-2046.

          Show
          Amareshwari Sriramadasu added a comment - Patch for Yahoo! distribution, on top of MAPREDUCE-2046 .
          Hide
          Amareshwari Sriramadasu added a comment -

          I just committed this.

          Show
          Amareshwari Sriramadasu added a comment - I just committed this.
          Hide
          dhruba borthakur added a comment -

          +1

          Show
          dhruba borthakur added a comment - +1
          Hide
          Amareshwari Sriramadasu added a comment -

          test-patch result:

               [exec]
               [exec] +1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec]
          

          All core and contrib unit tests passed.

          Show
          Amareshwari Sriramadasu added a comment - test-patch result: [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] All core and contrib unit tests passed.
          Hide
          Amareshwari Sriramadasu added a comment -

          This patch would have to get merged with the recently committed one from MAPREDUCE-2046

          Thanks Dhruba for the review. Uploaded patch is already merged with commit of MAPREDUCE-2046. Will update the test results soon.

          Show
          Amareshwari Sriramadasu added a comment - This patch would have to get merged with the recently committed one from MAPREDUCE-2046 Thanks Dhruba for the review. Uploaded patch is already merged with commit of MAPREDUCE-2046 . Will update the test results soon.
          Hide
          dhruba borthakur added a comment -

          +1, code looks good. This patch would have to get merged with the recently committed one from MAPREDUCE-2046

          Show
          dhruba borthakur added a comment - +1, code looks good. This patch would have to get merged with the recently committed one from MAPREDUCE-2046
          Hide
          Amareshwari Sriramadasu added a comment -

          Dhruba, Can you please have a look at the patch? Thanks

          Show
          Amareshwari Sriramadasu added a comment - Dhruba, Can you please have a look at the patch? Thanks
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch adding the support for non-splittable files in CombineFileInputFormat. If the file is not splittable, it generates OneBlockInfo with full file length.

          Show
          Amareshwari Sriramadasu added a comment - Patch adding the support for non-splittable files in CombineFileInputFormat. If the file is not splittable, it generates OneBlockInfo with full file length.

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Namit Jain
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development