Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1597

combinefileinputformat does not work with non-splittable files

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      CombineFileInputFormat.getSplits() does not take into account whether a file is splittable.
      This can lead to a problem for compressed text files - for example, getSplits() may return more
      than 1 split depending on the size of the compressed file, all the splits recordreader will read the
      complete file.

      I ran into this problem while using Hive on hadoop 20.

      1. patch-1597.txt
        28 kB
        Amareshwari Sriramadasu
      2. patch-1597-ydist.txt
        26 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Namit Jain created issue -
          Zheng Shao made changes -
          Field Original Value New Value
          Link This issue duplicates MAPREDUCE-1649 [ MAPREDUCE-1649 ]
          Amareshwari Sriramadasu made changes -
          Assignee Amareshwari Sriramadasu [ amareshwari ]
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch adding the support for non-splittable files in CombineFileInputFormat. If the file is not splittable, it generates OneBlockInfo with full file length.

          Show
          Amareshwari Sriramadasu added a comment - Patch adding the support for non-splittable files in CombineFileInputFormat. If the file is not splittable, it generates OneBlockInfo with full file length.
          Amareshwari Sriramadasu made changes -
          Attachment patch-1597.txt [ 12453938 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          Dhruba, Can you please have a look at the patch? Thanks

          Show
          Amareshwari Sriramadasu added a comment - Dhruba, Can you please have a look at the patch? Thanks
          Amareshwari Sriramadasu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 0.22.0 [ 12314184 ]
          Hide
          dhruba borthakur added a comment -

          +1, code looks good. This patch would have to get merged with the recently committed one from MAPREDUCE-2046

          Show
          dhruba borthakur added a comment - +1, code looks good. This patch would have to get merged with the recently committed one from MAPREDUCE-2046
          Hide
          Amareshwari Sriramadasu added a comment -

          This patch would have to get merged with the recently committed one from MAPREDUCE-2046

          Thanks Dhruba for the review. Uploaded patch is already merged with commit of MAPREDUCE-2046. Will update the test results soon.

          Show
          Amareshwari Sriramadasu added a comment - This patch would have to get merged with the recently committed one from MAPREDUCE-2046 Thanks Dhruba for the review. Uploaded patch is already merged with commit of MAPREDUCE-2046 . Will update the test results soon.
          Hide
          Amareshwari Sriramadasu added a comment -

          test-patch result:

               [exec]
               [exec] +1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec]
          

          All core and contrib unit tests passed.

          Show
          Amareshwari Sriramadasu added a comment - test-patch result: [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] All core and contrib unit tests passed.
          Hide
          dhruba borthakur added a comment -

          +1

          Show
          dhruba borthakur added a comment - +1
          Hide
          Amareshwari Sriramadasu added a comment -

          I just committed this.

          Show
          Amareshwari Sriramadasu added a comment - I just committed this.
          Amareshwari Sriramadasu made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch for Yahoo! distribution, on top of MAPREDUCE-2046.

          Show
          Amareshwari Sriramadasu added a comment - Patch for Yahoo! distribution, on top of MAPREDUCE-2046 .
          Amareshwari Sriramadasu made changes -
          Attachment patch-1597-ydist.txt [ 12455637 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )
          Konstantin Shvachko made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Steven Wong made changes -
          Link This issue supercedes HIVE-2089 [ HIVE-2089 ]
          Jarek Jarcec Cecho made changes -
          Link This issue is related to SQOOP-721 [ SQOOP-721 ]
          Rajat Khandelwal made changes -
          Link This issue blocks HIVE-11376 [ HIVE-11376 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          177d 12h 56m 1 Amareshwari Sriramadasu 06/Sep/10 10:53
          Patch Available Patch Available Resolved Resolved
          22h 13m 1 Amareshwari Sriramadasu 07/Sep/10 09:06
          Resolved Resolved Closed Closed
          460d 21h 12m 1 Konstantin Shvachko 12/Dec/11 06:19

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Namit Jain
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development