Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-830

Providing BZip2 splitting support for Text data

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.21.0
    • 0.21.0
    • None
    • None
    • Reviewed
    • Splitting support for BZip2 Text data

    Description

      HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains.

      Attachments

        1. MapReduce-830-version1.patch
          10 kB
          Abdul Qadeer
        2. M830-4.patch
          28 kB
          Christopher Douglas
        3. M830-4.patch
          28 kB
          Christopher Douglas
        4. M830-3.patch
          28 kB
          Christopher Douglas
        5. M830-2.patch
          11 kB
          Christopher Douglas

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            aqadeer Abdul Qadeer
            aqadeer Abdul Qadeer
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment