Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
0.20.2
-
None
-
None
-
Cluster:CentOS 5,jdk1.6.0_20
Client:Mac SnowLeopard,jdk1.6.0_20
-
SequenceFile.Reader,Gzip
Description
An hadoop job output a gzip compressed sequence file(whether record compressed or block compressed).The client program use SequenceFile.Reader to read this sequence file,when reading the client program shows the following exceptions:
2090 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2091 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
Exception in thread "main" java.io.EOFException
at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:170)
at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:180)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
at com.shiningware.intelligenceonline.taobao.mapreduce.HtmlContentSeqOutputView.main(HtmlContentSeqOutputView.java:28)
I studied the code in org.apache.hadoop.io.SequenceFile.Reader.init method and read:
// Initialize... not if this we are constructing a temporary Reader
if (!tempReader) {
valBuffer = new DataInputBuffer();
if (decompress)
else
{ valIn = valBuffer; }the problem seems to be caused by "valBuffer = new DataInputBuffer();" ,because GzipCodec.createInputStream creates an instance of GzipInputStream whose constructor creates an instance of ResetableGZIPInputStream class.When ResetableGZIPInputStream's constructor calls it base class java.util.zip.GZIPInputStream's constructor ,it trys to read the empty "valBuffer = new DataInputBuffer();" and get no content,so it throws an EOFException.
Attachments
Issue Links
- duplicates
-
HADOOP-8582 Improve error reporting for GZIP-compressed SequenceFiles with missing native libraries.
- Open
- is broken by
-
HADOOP-538 Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
- Closed