Issue Details (XML | Word | Printable)

Key: HADOOP-2664
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Chris Douglas
Reporter: Chris Douglas
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

lzop-compatible CompresionCodec

Created: 19/Jan/08 03:22 AM   Updated: 20/Nov/08 11:38 PM
Return to search
Component/s: io
Affects Version/s: None
Fix Version/s: 0.19.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 2664-0.patch 2008-01-19 03:27 AM Chris Douglas 22 kB
Text File Licensed for inclusion in ASF works 2664-1.patch 2008-01-21 10:13 PM Chris Douglas 23 kB
Text File Licensed for inclusion in ASF works 2664-2.patch 2008-06-11 11:04 PM Chris Douglas 25 kB
Issue Links:
Dependants
 
Incorporates
 

Hadoop Flags: Reviewed
Release Note: Introduced LZOP codec.
Resolution Date: 30/Jun/08 03:01 PM


 Description  « Hide
The current lzo codec is not compatible with the standard .lzo file format used by lzop.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Chris Douglas made changes - 19/Jan/08 03:22 AM
Field Original Value New Value
Link This issue depends on HADOOP-2402 [ HADOOP-2402 ]
Chris Douglas added a comment - 19/Jan/08 03:27 AM
This patch adds lzop compatibility as an optional codec. On writes, it adds a generic header to .lzo files; on reads, it respects and confirms any block-checksum data specified in the header. It cannot be used with SequenceFiles.

Chris Douglas made changes - 19/Jan/08 03:27 AM
Attachment 2664-0.patch [ 12373587 ]
Chris Douglas made changes - 19/Jan/08 04:02 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Hadoop QA added a comment - 20/Jan/08 06:48 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373587/2664-0.patch
against trunk revision r613499.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs -1. The patch appears to introduce 4 new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1662/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1662/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1662/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1662/console

This message is automatically generated.


Chris Douglas added a comment - 21/Jan/08 10:13 PM
Fixed findbugs warnings, bumped buffer to 256k (the size used by lzop) for the decompressor, changed the decompressor to the "safe" code to avoid crashing the JVM when it's too small, and added some documentation.

I have some reservations about this patch (memory usage, thread safety if pooled, etc), so I'm pushing it to 0.17.


Chris Douglas made changes - 21/Jan/08 10:13 PM
Attachment 2664-1.patch [ 12373706 ]
Chris Douglas made changes - 21/Jan/08 10:13 PM
Fix Version/s 0.16.0 [ 12312740 ]
Fix Version/s 0.17.0 [ 12312913 ]
Chris Douglas made changes - 22/Jan/08 08:00 PM
Status Patch Available [ 10002 ] Open [ 1 ]
Chris Douglas made changes - 22/Jan/08 08:01 PM
Status Open [ 1 ] Patch Available [ 10002 ]
Hadoop QA added a comment - 23/Jan/08 06:49 AM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373706/2664-1.patch
against trunk revision r614413.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1680/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1680/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1680/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1680/console

This message is automatically generated.


Chris Douglas added a comment - 13/Feb/08 11:28 PM
-1

I'm pulling this back. The writes from the cstr and its (related) silent incompatibility with SequenceFile are sufficient to prevent it from being checked in. It reads and writes lzop-compatible files, but it is inadequate as a general compression codec. SequenceFile explicitly checks for a non-native version of GzipCodec, but surely there's a better way to effect this.

That said, it should be noted that one can still write ".lzo" files from LzoCodec that aren't. The incompatible change in this patch- that asserts precedence for the .lzo extension and changes the former to .lzo_deflate - should be considered for 0.17 regardless of what happens with this patch.


Chris Douglas made changes - 13/Feb/08 11:28 PM
Status Patch Available [ 10002 ] Open [ 1 ]
Robert Chansler made changes - 25/Mar/08 03:03 AM
Fix Version/s 0.17.0 [ 12312913 ]
Chris Douglas added a comment - 30/Apr/08 08:46 PM
I'm making this PA again. The sin for which it was withdrawn- writing out the header in the constructor- is actually a fairly minor one (that java.util.zip.GzipOutputStream is also guilty of). I'm not sure what to do with the SequenceFile incompatibility.

Chris Douglas made changes - 30/Apr/08 08:46 PM
Status Open [ 1 ] Patch Available [ 10002 ]
Fix Version/s 0.18.0 [ 12312972 ]
Hadoop QA added a comment - 01/May/08 06:51 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373706/2664-1.patch
against trunk revision 645773.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2356/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2356/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2356/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2356/console

This message is automatically generated.


Chris Douglas made changes - 06/May/08 10:53 PM
Link This issue incorporates HADOOP-1694 [ HADOOP-1694 ]
Owen O'Malley added a comment - 22/May/08 04:47 PM
This really should have unit test.

Owen O'Malley made changes - 22/May/08 04:47 PM
Status Patch Available [ 10002 ] Open [ 1 ]
Mukund Madhugiri made changes - 07/Jun/08 01:26 AM
Fix Version/s 0.18.0 [ 12312972 ]
Chris Douglas added a comment - 11/Jun/08 10:58 PM
Added a test and an entry to io.compression.codecs.

Chris Douglas made changes - 11/Jun/08 10:58 PM
Attachment 2664-2.patch [ 12383877 ]
Chris Douglas made changes - 11/Jun/08 10:58 PM
Fix Version/s 0.19.0 [ 12313211 ]
Hadoop Flags [Incompatible change]
Status Open [ 1 ] Patch Available [ 10002 ]
Chris Douglas made changes - 11/Jun/08 11:01 PM
Status Patch Available [ 10002 ] Open [ 1 ]
Chris Douglas made changes - 11/Jun/08 11:03 PM
Attachment 2664-2.patch [ 12383877 ]
Chris Douglas added a comment - 11/Jun/08 11:04 PM
Missed some files

Chris Douglas made changes - 11/Jun/08 11:04 PM
Attachment 2664-2.patch [ 12383878 ]
Chris Douglas made changes - 11/Jun/08 11:04 PM
Status Open [ 1 ] Patch Available [ 10002 ]
Hadoop QA added a comment - 12/Jun/08 03:19 AM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12383878/2664-2.patch
against trunk revision 666620.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2643/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2643/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2643/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2643/console

This message is automatically generated.


Repository Revision Date User Message
ASF #672788 Mon Jun 30 14:59:55 UTC 2008 omalley HADOOP-2664. Add a lzop compatible codec, so that files compressed by lzop
may be processed by map/reduce. Contributed by Chris Douglas.
Files Changed
MODIFY /hadoop/core/trunk/src/core/org/apache/hadoop/io/compress/lzo/LzoDecompressor.java
MODIFY /hadoop/core/trunk/CHANGES.txt
MODIFY /hadoop/core/trunk/src/core/org/apache/hadoop/io/compress/lzo/LzoCompressor.java
ADD /hadoop/core/trunk/src/core/org/apache/hadoop/io/compress/LzopCodec.java
MODIFY /hadoop/core/trunk/src/test/org/apache/hadoop/io/compress/TestCodec.java
MODIFY /hadoop/core/trunk/src/native/src/org/apache/hadoop/io/compress/lzo/LzoCompressor.c
MODIFY /hadoop/core/trunk/conf/hadoop-default.xml
MODIFY /hadoop/core/trunk/src/core/org/apache/hadoop/io/compress/LzoCodec.java
MODIFY /hadoop/core/trunk/src/native/src/org/apache/hadoop/io/compress/lzo/LzoDecompressor.c

Owen O'Malley added a comment - 30/Jun/08 03:01 PM
I just committed this. Thanks, Chris!

Owen O'Malley made changes - 30/Jun/08 03:01 PM
Resolution Fixed [ 1 ]
Hadoop Flags [Incompatible change] [Incompatible change, Reviewed]
Status Patch Available [ 10002 ] Resolved [ 5 ]
Hudson added a comment - 01/Jul/08 12:56 PM

Robert Chansler made changes - 21/Oct/08 11:54 PM
Release Note Introduced LZOP codec.
Hadoop Flags [Reviewed, Incompatible change] [Reviewed]
Nigel Daley made changes - 20/Nov/08 11:38 PM
Status Resolved [ 5 ] Closed [ 6 ]