Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-830

Providing BZip2 splitting support for Text data

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Splitting support for BZip2 Text data

      Description

      HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains.

        Attachments

        1. M830-2.patch
          11 kB
          Chris Douglas
        2. M830-3.patch
          28 kB
          Chris Douglas
        3. M830-4.patch
          28 kB
          Chris Douglas
        4. M830-4.patch
          28 kB
          Chris Douglas
        5. MapReduce-830-version1.patch
          10 kB
          Abdul Qadeer

          Issue Links

            Activity

              People

              • Assignee:
                aqadeer Abdul Qadeer
                Reporter:
                aqadeer Abdul Qadeer
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: