Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      HADOOP-6835 added the framework and direct support for concatenated gzip files. We should do the same for bzip files.

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          141d 20h 25m 1 Allen Wittenauer 02/Nov/11 17:58
          Resolved Resolved Reopened Reopened
          396d 17h 41m 1 Harsh J 03/Dec/12 11:40
          Reopened Reopened Resolved Resolved
          7d 5h 28m 1 Harsh J 10/Dec/12 17:09
          Gavin made changes -
          Assignee Karthik Kambatla [ kkambatl ] Karthik Kambatla [ kasha ]
          Hide
          Harsh J added a comment -

          Dupe of HADOOP-4012.

          Show
          Harsh J added a comment - Dupe of HADOOP-4012 .
          Harsh J made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Harsh J made changes -
          Link This issue duplicates HADOOP-4012 [ HADOOP-4012 ]
          Hide
          Harsh J added a comment -

          Thanks for confirming! Resolving as dupe.

          Show
          Harsh J added a comment - Thanks for confirming! Resolving as dupe.
          Hide
          Yu Li added a comment -

          I have tested concatenated bzip2 files with hadoop-1.0.3 plus patch of HADOOP-7823, and confirmed it could be read-out correctly in MR job. Below are the detailed steps of my testing:

          1) create file test1, with content:
          =================================
          Hello World
          World test
          =================================
          2) create file test2, with content:
          =================================
          Hello Jay
          Jay test
          =================================
          3) compress them using command "bzip2 -z test1 test2", and this would create test1.bz2 and test2.bz2
          4) create the concatenated bzip2 file with command "cat test1.bz2 test2.bz2 > test-contatenate.bz2"
          5) create dir and put the concatenated bzip2 file in HDFS: "hadoop fs -mkdir /tmp/bzip2/input && hadoop fs -put test-contatenate.bz2 /tmp/bzip2/input"
          6) run wordcount example program to test: "hadoop jar $HADOOP_HOME/hadoop-examples*.jar wordcount /tmp/bzip2/input /tmp/bzip2/output"
          7) check the result, it's correct with content:
          =================================
          Hello 2
          Jay 2
          World 2
          test 2
          =================================

          Show
          Yu Li added a comment - I have tested concatenated bzip2 files with hadoop-1.0.3 plus patch of HADOOP-7823 , and confirmed it could be read-out correctly in MR job. Below are the detailed steps of my testing: 1) create file test1, with content: ================================= Hello World World test ================================= 2) create file test2, with content: ================================= Hello Jay Jay test ================================= 3) compress them using command "bzip2 -z test1 test2", and this would create test1.bz2 and test2.bz2 4) create the concatenated bzip2 file with command "cat test1.bz2 test2.bz2 > test-contatenate.bz2" 5) create dir and put the concatenated bzip2 file in HDFS: "hadoop fs -mkdir /tmp/bzip2/input && hadoop fs -put test-contatenate.bz2 /tmp/bzip2/input" 6) run wordcount example program to test: "hadoop jar $HADOOP_HOME/hadoop-examples*.jar wordcount /tmp/bzip2/input /tmp/bzip2/output" 7) check the result, it's correct with content: ================================= Hello 2 Jay 2 World 2 test 2 =================================
          Hide
          Karthik Kambatla (Inactive) added a comment -

          AFAIK, people are interested in using concatenated bzip2 files. I think we should work on it.

          Show
          Karthik Kambatla (Inactive) added a comment - AFAIK, people are interested in using concatenated bzip2 files. I think we should work on it.
          Karthik Kambatla (Inactive) made changes -
          Assignee Karthik Kambatla [ kkambatl ]
          Harsh J made changes -
          Resolution Won't Fix [ 2 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Harsh J added a comment -

          Allen,

          You'd closed this out without a reason as "Won't Fix", so am reopening it. If there was a reason for the Won't Fix, please provide, thanks!

          Show
          Harsh J added a comment - Allen, You'd closed this out without a reason as "Won't Fix", so am reopening it. If there was a reason for the Won't Fix, please provide, thanks!
          Allen Wittenauer made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Won't Fix [ 2 ]
          Allen Wittenauer made changes -
          Field Original Value New Value
          Link This issue is blocked by HADOOP-6835 [ HADOOP-6835 ]
          Allen Wittenauer created issue -

            People

            • Assignee:
              Karthik Kambatla
              Reporter:
              Allen Wittenauer
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development