Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
bzip2 provided as codec in 0.19.0 https://issues.apache.org/jira/browse/HADOOP-3646
Description
Unlike gzip, the bzip file format supports splitting. Compression is by blocks (900k by default) and blocks are separated by a synchronization marker (a 48-bit approximation of Pi). This would permit very large compressed files to be split into multiple map tasks, which is not currently possible unless using a Hadoop-specific file format.
Attachments
Attachments
Issue Links
- relates to
-
HADOOP-3646 Providing bzip2 as codec
- Closed