Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None

      Description

      Unlike gzip, the bzip file format supports splitting. Compression is by blocks (900k by default) and blocks are separated by a synchronization marker (a 48-bit approximation of Pi). This would permit very large compressed files to be split into multiple map tasks, which is not currently possible unless using a Hadoop-specific file format.

        Attachments

        1. bzip2.jar
          27 kB
          Utkarsh Srivastava

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                cutting Doug Cutting
              • Votes:
                2 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: