[HADOOP-2424] lzop compatible CompressionCodec - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: io, native
Labels:
None

Description

LzoCodec currently outputs at most io.compression.codec.lzo.buffersize (default 64k)- less the compression overhead- bytes per write (~~HADOOP-2402~~) in the following format:

[uncompressed block length(32)]
[compressed block length(32)]
[compressed block]

lzop (lzo-backed command-line utility) writes blocks in the following format:

[uncompressed block length(32)]
[compressed block length (32)]
[Adler-32|CRC-32 checksum of uncompressed block (32)]
[Adler-32|CRC-32 checksum of compressed block (32)]
[compressed block]

There's an additional ~32 byte header to the file. I don't know of a standard, but the lzop source should suffice.

Since we're using ".lzo" as the default extension, it's worth considering being compatible with lzop, but not necessarily for all lzo-compressed blocks. For example, SequenceFiles should use the existing LzoCodec format.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Christopher Douglas

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 14/Dec/07 00:16

Updated:: 04/Jan/08 18:33

Resolved:: 14/Dec/07 00:23