The following simplified code (manually picked out of testMoreBzip2() in https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch) triggers a "java.io.IOException: bad block header" in org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( CBZip2InputStream.java:527):
The specified file is also included in the H-6835 patch linked above, and some additional debug output is included in the commented-out test loop above. (Only in the linked, "v4" version of the patch, however--I'm about to remove the debug stuff for checkin.)
It's possible I've done something completely boneheaded here, but the file, at least, checks out in a subsequent set of subtests and with stock bzip2 itself. Only the code above is problematic; it reads through the first concatenated chunk (17 lines of text) just fine but chokes on the header of the second one. Altogether, the test file contains 84 lines of text and 4 concatenated bzip2 files.
(It's possible this is a mapreduce issue rather than common, but note that the identical gzip test works fine. Possibly it's related to the stream-vs-decompressor dichotomy, though; intentionally not supported?)