Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-6835

Support concatenated gzip files

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.20.2
    • 0.22.0
    • io
    • None
    • Incompatible change, Reviewed
    • Processing of concatenated gzip files formerly stopped (quietly) at the end of the first substream/"member"; now processing will continue to the end of the concatenated stream, like gzip(1) does. (bzip2 support is unaffected by this patch.)

    Description

      When running MapReduce with concatenated gzip files as input only the first part is read, which is confusing, to say the least. Concatenated gzip is described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

      Attachments

        1. C6835-9.patch
          49 kB
          Christopher Douglas
        2. grr-hadoop-common.dif.20100614c
          37 kB
          Greg Roelofs
        3. grr-hadoop-mapreduce.dif.20100614c
          25 kB
          Greg Roelofs
        4. HADOOP-6835.v3.yahoo-0.20.2xx-branch.patch
          82 kB
          Greg Roelofs
        5. HADOOP-6835.v4.trunk-hadoop-common.patch
          18 kB
          Greg Roelofs
        6. HADOOP-6835.v4.trunk-hadoop-mapreduce.patch
          47 kB
          Greg Roelofs
        7. HADOOP-6835.v4.yahoo-0.20.2xx-branch.patch
          83 kB
          Greg Roelofs
        8. HADOOP-6835.v5.trunk-hadoop-common.patch
          20 kB
          Greg Roelofs
        9. HADOOP-6835.v6.trunk-hadoop-common.patch
          41 kB
          Greg Roelofs
        10. HADOOP-6835.v7.trunk-hadoop-common.patch
          41 kB
          Greg Roelofs
        11. HADOOP-6835.v8.trunk-hadoop-common.patch
          47 kB
          Greg Roelofs
        12. HADOOP-6835.v9.yahoo-0.20.2xx-branch.patch
          46 kB
          Greg Roelofs
        13. MR-469.v2.yahoo-0.20.2xx-branch.patch
          69 kB
          Greg Roelofs

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            roelofs Greg Roelofs
            tomwhite Thomas White
            Votes:
            2 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment