Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-759

add hive.intermediate.compression.codec option

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.5.0
    • Query Processor
    • Reviewed
    • HIVE-759. Add "hive.intermediate.compression.codec/type" option. (Yongqiang He via zshao)

    Description

      Hive uses the jobconf compression codec for all map-reduce jobs. This includes both mapred.map.output.compression.codec and mapred.output.compression.codec.

      In some cases, we want to distinguish between the codec used for intermediate map-reduce jobs (that produces intermediate data between jobs) and the final map-reduce jobs (that produces data stored in tables).

      For intermediate data, lzo might be a better fit because it's much faster; for final data, gzip might be a better fit because it saves disk spaces.

      We should introduce two new options:

      hive.intermediate.compression.codec=org.apache.hadoop.io.compress.LzoCodec
      hive.intermediate.compression.type=BLOCK
      

      And use these 2 options to override the mapred.output.compression.* in the FileSinkOperator that produces intermediate data.

      Note that it's possible that a single map-reduce job may have 2 FileSInkOperators: one produces intermediate data, and one produces final data. So we need to add a flag to fileSinkDesc for that.

      Attachments

        1. hive-759-2009-08-18-2.patch
          5 kB
          He Yongqiang
        2. hive-759-2009-08-18.patch
          6 kB
          He Yongqiang
        3. hive-759-2009-08-17.patch
          5 kB
          He Yongqiang

        Activity

          People

            he yongqiang He Yongqiang
            zshao Zheng Shao
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: