Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1823

want InputFormat for bzip2 files

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.19.0
    • None
    • None

    Description

      Unlike gzip, the bzip file format supports splitting. Compression is by blocks (900k by default) and blocks are separated by a synchronization marker (a 48-bit approximation of Pi). This would permit very large compressed files to be split into multiple map tasks, which is not currently possible unless using a Hadoop-specific file format.

      Attachments

        1. bzip2.jar
          27 kB
          Utkarsh Srivastava

        Issue Links

          Activity

            People

              Unassigned Unassigned
              cutting Doug Cutting
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: