I have tested concatenated bzip2 files with hadoop-1.0.3 plus patch of
HADOOP-7823, and confirmed it could be read-out correctly in MR job. Below are the detailed steps of my testing:
1) create file test1, with content:
2) create file test2, with content:
3) compress them using command "bzip2 -z test1 test2", and this would create test1.bz2 and test2.bz2
4) create the concatenated bzip2 file with command "cat test1.bz2 test2.bz2 > test-contatenate.bz2"
5) create dir and put the concatenated bzip2 file in HDFS: "hadoop fs -mkdir /tmp/bzip2/input && hadoop fs -put test-contatenate.bz2 /tmp/bzip2/input"
6) run wordcount example program to test: "hadoop jar $HADOOP_HOME/hadoop-examples*.jar wordcount /tmp/bzip2/input /tmp/bzip2/output"
7) check the result, it's correct with content: