Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      HADOOP-6835 added the framework and direct support for concatenated gzip files. We should do the same for bzip files.

        Issue Links

          Activity

          Hide
          Harsh J added a comment -

          Allen,

          You'd closed this out without a reason as "Won't Fix", so am reopening it. If there was a reason for the Won't Fix, please provide, thanks!

          Show
          Harsh J added a comment - Allen, You'd closed this out without a reason as "Won't Fix", so am reopening it. If there was a reason for the Won't Fix, please provide, thanks!
          Hide
          Karthik Kambatla added a comment -

          AFAIK, people are interested in using concatenated bzip2 files. I think we should work on it.

          Show
          Karthik Kambatla added a comment - AFAIK, people are interested in using concatenated bzip2 files. I think we should work on it.
          Hide
          Yu Li added a comment -

          I have tested concatenated bzip2 files with hadoop-1.0.3 plus patch of HADOOP-7823, and confirmed it could be read-out correctly in MR job. Below are the detailed steps of my testing:

          1) create file test1, with content:
          =================================
          Hello World
          World test
          =================================
          2) create file test2, with content:
          =================================
          Hello Jay
          Jay test
          =================================
          3) compress them using command "bzip2 -z test1 test2", and this would create test1.bz2 and test2.bz2
          4) create the concatenated bzip2 file with command "cat test1.bz2 test2.bz2 > test-contatenate.bz2"
          5) create dir and put the concatenated bzip2 file in HDFS: "hadoop fs -mkdir /tmp/bzip2/input && hadoop fs -put test-contatenate.bz2 /tmp/bzip2/input"
          6) run wordcount example program to test: "hadoop jar $HADOOP_HOME/hadoop-examples*.jar wordcount /tmp/bzip2/input /tmp/bzip2/output"
          7) check the result, it's correct with content:
          =================================
          Hello 2
          Jay 2
          World 2
          test 2
          =================================

          Show
          Yu Li added a comment - I have tested concatenated bzip2 files with hadoop-1.0.3 plus patch of HADOOP-7823 , and confirmed it could be read-out correctly in MR job. Below are the detailed steps of my testing: 1) create file test1, with content: ================================= Hello World World test ================================= 2) create file test2, with content: ================================= Hello Jay Jay test ================================= 3) compress them using command "bzip2 -z test1 test2", and this would create test1.bz2 and test2.bz2 4) create the concatenated bzip2 file with command "cat test1.bz2 test2.bz2 > test-contatenate.bz2" 5) create dir and put the concatenated bzip2 file in HDFS: "hadoop fs -mkdir /tmp/bzip2/input && hadoop fs -put test-contatenate.bz2 /tmp/bzip2/input" 6) run wordcount example program to test: "hadoop jar $HADOOP_HOME/hadoop-examples*.jar wordcount /tmp/bzip2/input /tmp/bzip2/output" 7) check the result, it's correct with content: ================================= Hello 2 Jay 2 World 2 test 2 =================================
          Hide
          Harsh J added a comment -

          Thanks for confirming! Resolving as dupe.

          Show
          Harsh J added a comment - Thanks for confirming! Resolving as dupe.
          Hide
          Harsh J added a comment -

          Dupe of HADOOP-4012.

          Show
          Harsh J added a comment - Dupe of HADOOP-4012 .

            People

            • Assignee:
              Karthik Kambatla
              Reporter:
              Allen Wittenauer
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development