XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Later
    • None
    • None
    • io
    • None
    • Make Gzipped input splittable by offering a tradeoff between "Spent resources" and "Wall clock time"

    Description

      Files compressed with the gzip codec are not splittable due to the nature of the codec.
      This limits the options you have scaling out when reading large gzipped input files.

      Given the fact that gunzipping a 1GiB file usually takes only 2 minutes I figured that for some use cases wasting some resources may result in a shorter job time under certain conditions.
      So reading the entire input file from the start for each split (wasting resources!!) may lead to additional scalability.

      Attachments

        1. HADOOP-7076.patch
          40 kB
          Niels Basjes
        2. HADOOP-7076-2011-01-26.patch
          40 kB
          Niels Basjes
        3. HADOOP-7076-2011-01-29.patch
          41 kB
          Niels Basjes
        4. HADOOP-7076-2011-02-05.patch
          42 kB
          Niels Basjes
        5. HADOOP-7076-2011-02-06.patch
          42 kB
          Niels Basjes
        6. HADOOP-7076-2011-05-18.patch
          43 kB
          Niels Basjes
        7. HADOOP-7076-2011-08-05-2255.patch
          6 kB
          Niels Basjes
        8. HADOOP-7076-2011-08-05-2315.patch
          43 kB
          Niels Basjes
        9. HADOOP-7076-2011-12-04-2332.patch
          40 kB
          Niels Basjes
        10. HADOOP-7076-2011-12-09.patch
          41 kB
          Niels Basjes
        11. HADOOP-7076-2011-12-09-branch-0.22.patch
          40 kB
          Niels Basjes
        12. HADOOP-7076-branch-0.22.patch
          40 kB
          Niels Basjes

        Issue Links

          Activity

            People

              nielsbasjes Niels Basjes
              nielsbasjes Niels Basjes
              Votes:
              0 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: