Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.2
    • Fix Version/s: 0.22.0
    • Component/s: io
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Processing of concatenated gzip files formerly stopped (quietly) at the end of the first substream/"member"; now processing will continue to the end of the concatenated stream, like gzip(1) does. (bzip2 support is unaffected by this patch.)

      Description

      When running MapReduce with concatenated gzip files as input only the first part is read, which is confusing, to say the least. Concatenated gzip is described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

      1. MR-469.v2.yahoo-0.20.2xx-branch.patch
        69 kB
        Greg Roelofs
      2. HADOOP-6835.v9.yahoo-0.20.2xx-branch.patch
        46 kB
        Greg Roelofs
      3. HADOOP-6835.v8.trunk-hadoop-common.patch
        47 kB
        Greg Roelofs
      4. HADOOP-6835.v7.trunk-hadoop-common.patch
        41 kB
        Greg Roelofs
      5. HADOOP-6835.v6.trunk-hadoop-common.patch
        41 kB
        Greg Roelofs
      6. HADOOP-6835.v5.trunk-hadoop-common.patch
        20 kB
        Greg Roelofs
      7. HADOOP-6835.v4.yahoo-0.20.2xx-branch.patch
        83 kB
        Greg Roelofs
      8. HADOOP-6835.v4.trunk-hadoop-mapreduce.patch
        47 kB
        Greg Roelofs
      9. HADOOP-6835.v4.trunk-hadoop-common.patch
        18 kB
        Greg Roelofs
      10. HADOOP-6835.v3.yahoo-0.20.2xx-branch.patch
        82 kB
        Greg Roelofs
      11. grr-hadoop-mapreduce.dif.20100614c
        25 kB
        Greg Roelofs
      12. grr-hadoop-common.dif.20100614c
        37 kB
        Greg Roelofs
      13. C6835-9.patch
        49 kB
        Chris Douglas

        Issue Links

          Activity

          Konstantin Shvachko made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Allen Wittenauer made changes -
          Link This issue blocks HADOOP-7386 [ HADOOP-7386 ]
          Allen Wittenauer made changes -
          Summary Support concatenated gzip and bzip2 files Support concatenated gzip files
          Greg Roelofs made changes -
          Chris Douglas made changes -
          Attachment C6835-9.patch [ 12448932 ]
          Chris Douglas made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Incompatible change] [Incompatible change, Reviewed]
          Resolution Fixed [ 1 ]
          Greg Roelofs made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Greg Roelofs made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Greg Roelofs made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Greg Roelofs made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Greg Roelofs made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Greg Roelofs made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Greg Roelofs made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Release Note Processing of concatenated gzip files formerly stopped (quietly) at the end of the first substream/"member"; now processing will continue to the end of the concatenated stream, like gzip(1) does. Processing of concatenated gzip files formerly stopped (quietly) at the end of the first substream/"member"; now processing will continue to the end of the concatenated stream, like gzip(1) does. (bzip2 support is unaffected by this patch.)
          Affects Version/s 0.20.2 [ 12314203 ]
          Fix Version/s 0.22.0 [ 12314296 ]
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Hadoop Flags [Incompatible change]
          Release Note Processing of concatenated gzip files formerly stopped (quietly) at the end of the first substream/"member"; now processing will continue to the end of the concatenated stream, like gzip(1) does.
          Component/s io [ 12310687 ]
          Greg Roelofs made changes -
          Project Hadoop Map/Reduce [ 12310941 ] Hadoop Common [ 12310240 ]
          Key MAPREDUCE-469 HADOOP-6835
          Greg Roelofs made changes -
          Attachment MR-469.v2.yahoo-0.20.2xx-branch.patch [ 12447661 ]
          Greg Roelofs made changes -
          Attachment grr-hadoop-common.dif.20100614c [ 12447105 ]
          Attachment grr-hadoop-mapreduce.dif.20100614c [ 12447106 ]
          Greg Roelofs made changes -
          Assignee Ravi Gummadi [ ravidotg ] Greg Roelofs [ roelofs ]
          Greg Roelofs made changes -
          Link This issue is cloned as MAPREDUCE-1795 [ MAPREDUCE-1795 ]
          Greg Roelofs made changes -
          Link This issue relates to MAPREDUCE-1795 [ MAPREDUCE-1795 ]
          Greg Roelofs made changes -
          Link This issue is cloned as MAPREDUCE-1795 [ MAPREDUCE-1795 ]
          David Ciemiewicz made changes -
          Summary Support concatenated gzip files Support concatenated gzip and bzip2 files
          Ravi Gummadi made changes -
          Link This issue is related to HADOOP-6335 [ HADOOP-6335 ]
          Ravi Gummadi made changes -
          Assignee Ravi Gummadi [ ravidotg ]
          Owen O'Malley made changes -
          Project Hadoop Common [ 12310240 ] Hadoop Map/Reduce [ 12310941 ]
          Key HADOOP-5014 MAPREDUCE-469
          Affects Version/s 0.17.0 [ 12312913 ]
          Affects Version/s 0.19.0 [ 12313211 ]
          Component/s io [ 12310687 ]
          Component/s mapred [ 12310690 ]
          Tom White made changes -
          Field Original Value New Value
          Link This issue is related to PIG-42 [ PIG-42 ]
          Tom White created issue -

            People

            • Assignee:
              Greg Roelofs
              Reporter:
              Tom White
            • Votes:
              2 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development