Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Later
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: io
    • Labels:
      None
    • Target Version/s:
    • Release Note:
      Make Gzipped input splittable by offering a tradeoff between "Spent resources" and "Wall clock time"

      Description

      Files compressed with the gzip codec are not splittable due to the nature of the codec.
      This limits the options you have scaling out when reading large gzipped input files.

      Given the fact that gunzipping a 1GiB file usually takes only 2 minutes I figured that for some use cases wasting some resources may result in a shorter job time under certain conditions.
      So reading the entire input file from the start for each split (wasting resources!!) may lead to additional scalability.

      1. HADOOP-7076-2011-12-09-branch-0.22.patch
        40 kB
        Niels Basjes
      2. HADOOP-7076-2011-12-09.patch
        41 kB
        Niels Basjes
      3. HADOOP-7076-branch-0.22.patch
        40 kB
        Niels Basjes
      4. HADOOP-7076-2011-12-04-2332.patch
        40 kB
        Niels Basjes
      5. HADOOP-7076-2011-08-05-2315.patch
        43 kB
        Niels Basjes
      6. HADOOP-7076-2011-08-05-2255.patch
        6 kB
        Niels Basjes
      7. HADOOP-7076-2011-05-18.patch
        43 kB
        Niels Basjes
      8. HADOOP-7076-2011-02-06.patch
        42 kB
        Niels Basjes
      9. HADOOP-7076-2011-02-05.patch
        42 kB
        Niels Basjes
      10. HADOOP-7076-2011-01-29.patch
        41 kB
        Niels Basjes
      11. HADOOP-7076-2011-01-26.patch
        40 kB
        Niels Basjes
      12. HADOOP-7076.patch
        40 kB
        Niels Basjes

        Issue Links

          Activity

          Robert Joseph Evans made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Later [ 7 ]
          Arun C Murthy made changes -
          Target Version/s 0.23.1, 0.22.0 [ 12318884, 12314296 ] 0.23.2 [ 12319855 ]
          Harsh J made changes -
          Link This issue is related to HADOOP-6153 [ HADOOP-6153 ]
          Tim Broberg made changes -
          Link This issue is related to HADOOP-7909 [ HADOOP-7909 ]
          Niels Basjes made changes -
          Niels Basjes made changes -
          Attachment HADOOP-7076-2011-12-09.patch [ 12506791 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076-branch-0.22.patch [ 12506182 ]
          Niels Basjes made changes -
          Target Version/s 0.23.1 [ 12318884 ] 0.22.0, 0.23.1 [ 12314296, 12318884 ]
          Eli Collins made changes -
          Fix Version/s 0.23.1 [ 12318884 ]
          Target Version/s 0.23.1 [ 12318884 ]
          Niels Basjes made changes -
          Fix Version/s 0.23.1 [ 12318884 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076-2011-12-04-2332.patch [ 12506062 ]
          Niels Basjes made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Niels Basjes made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076-2011-08-05-2315.patch [ 12489536 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076-2011-08-05-2255.patch [ 12489532 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076-2011-05-18.patch [ 12479577 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076-2011-02-06.patch [ 12470415 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076-2011-02-05.patch [ 12470362 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076-2011-01-29.patch [ 12469755 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076-2011-01-26.patch [ 12469492 ]
          Niels Basjes made changes -
          Release Note Make Gzipped input splittable by offering a tradeoff between "Spent resources" and "Wall clock time"
          Jakob Homan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Jakob Homan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Niels Basjes made changes -
          Status In Progress [ 3 ] Patch Available [ 10002 ]
          Doug Cutting made changes -
          Assignee Niels Basjes [ nielsbasjes ]
          Niels Basjes made changes -
          Attachment HADOOP-7076.patch [ 12467585 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076.patch [ 12467672 ]
          Niels Basjes made changes -
          Status Patch Available [ 10002 ] In Progress [ 3 ]
          Niels Basjes made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076.patch [ 12466874 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076.patch [ 12467585 ]
          Niels Basjes made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Niels Basjes made changes -
          Attachment HADOOP-7076.patch [ 12466874 ]
          Niels Basjes made changes -
          Field Original Value New Value
          Status Open [ 1 ] Patch Available [ 10002 ]
          Niels Basjes created issue -

            People

            • Assignee:
              Niels Basjes
              Reporter:
              Niels Basjes
            • Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development