It appears that the initial target location for the fix, in LineRecordReader's next() method (0.20.x) or nextKeyValue() (trunk), isn't actually workable due to buffering. Ideally one would be able to check getFilePosition() after hitting the end of the first member/zlib-stream, notice that it's not equal to the end of file, and optionally throw an error. However, the file position, in general, is beyond the end of the zlib-stream, and for small concatenated inputs it may actually be at the end of file even though the logical offset isn't. There doesn't appear to be a way to get at the logical "stream offset" at this level, though if anyone is aware of a way, please let me know.
In the meantime, we're planning to simply fix the bug (i.e.,
MAPREDUCE-469), at least for the native-zlib codec. A workaround for the Java-zlib alternative is in the 30-AUG-2006 comment on Sun's bug 4691425 (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4691425), but without any explicit license that would allow us to redistribute it as part of Hadoop. And bzip2 reportedly is already fixed on the trunk ( HADOOP-4012).
Barring any new information, I plan to resolve this issue as invalid.