Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
0.5.0
-
None
-
None
Description
The extended error message with the offending values finally paid off and I was able to get the values that were causing the Summber buffer overflow exception.
java.lang.RuntimeException: Summer buffer overflow b.len=4096, off=0, summed=512, read=2880, bytesPerSum=1, inSum=512
at org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:100)
at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:170)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
at java.io.DataInputStream.read(DataInputStream.java:80)
at org.apache.hadoop.util.CopyFiles$DFSCopyFilesMapper.copy(CopyFiles.java:190)
at org.apache.hadoop.util.CopyFiles$DFSCopyFilesMapper.map(CopyFiles.java:391)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:196)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1075)
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.util.zip.CRC32.update(CRC32.java:43)
at org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:98)
... 9 more
Tracking through the code, what happens is inside of FSDataInputStream.Checker.read() the verifySum gets an EOF Exception and turns off the summing. Among other things this sets the bytesPerSum to 1. Unfortunately, that leads to the ArrayIndexOutOfBoundsException.
I think the problem is that the original EOF exception was logged and ignored. I propose that we allow the original EOF to propagate back to the caller. (So that file not found will still disable the checksum checking, but we will detect truncated checksum files.)