Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-830

Providing BZip2 splitting support for Text data

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.21.0
    • 0.21.0
    • None
    • None
    • Reviewed
    • Splitting support for BZip2 Text data

    Description

      HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains.

      Attachments

        1. MapReduce-830-version1.patch
          10 kB
          Abdul Qadeer
        2. M830-2.patch
          11 kB
          Christopher Douglas
        3. M830-3.patch
          28 kB
          Christopher Douglas
        4. M830-4.patch
          28 kB
          Christopher Douglas
        5. M830-4.patch
          28 kB
          Christopher Douglas

        Issue Links

          Activity

            People

              aqadeer Abdul Qadeer
              aqadeer Abdul Qadeer
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: