Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
0.14.0
-
None
-
None
-
None
Description
When running the wordcount example with text, gzip and lzo compressed input files, the lzo compressed input files are not properly recognized and are treated as text files.
With an input dir of
/user/hadoopqa/input/part-001.txt
/user/hadoopqa/input/part-002.txt.gz
/user/hadoopqa/input/part-003.txt.lzo
and running this command
bin/hadoopqa jar hadoop-examples.jar wordcount /user/hadoopqa/input /user/hadoopqa/output
I get output that looks like
row 4
royal 4
rt$3-ex?ÔøΩ?÷µIStÔøΩ"4D%ÔøΩ9$UÔøΩÔøΩ"ÔøΩ, 1
ru$ÔøΩÔøΩ#~t"@ÔøΩm*d#\/$ÔøΩÔøΩl.t"XÔøΩÔøΩDi" 1
rubbÔøΩdÔøΩ&@bT 1
rubbed 2
To lzo compress the file I used lzop:
http://www.lzop.org/download/lzop-1.01-linux_i386.tar.gz
Attachments
Attachments
Issue Links
- is part of
-
HADOOP-2664 lzop-compatible CompresionCodec
- Closed