Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1823

want InputFormat for bzip2 files

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.19.0
    • None
    • None

    Description

      Unlike gzip, the bzip file format supports splitting. Compression is by blocks (900k by default) and blocks are separated by a synchronization marker (a 48-bit approximation of Pi). This would permit very large compressed files to be split into multiple map tasks, which is not currently possible unless using a Hadoop-specific file format.

      Attachments

        1. bzip2.jar
          27 kB
          Utkarsh Srivastava

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            cutting Doug Cutting
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment