Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1862

AvroOutputFormat saves compressed avrò files without respecting codec's default extension

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • 1.8.1
    • 1.9.0
    • java
    • avro.mapred.output.extension.from-codec (boolean) MapReduce Job property that allows change extension of output files from .avro to .$compressionCodec.avro
    • Patch

    Description

      Common pattern in naming compressed files is giving them extension derived from compression codec, for example: .gz, .zip, .bz2.
      AvroOutputFormat currently does not respect this convention.

      I've adapted some code from Hadoop's TextOutputFormat in backward-compatible manner adding following JobConf property:

      avro.mapred.output.extension.from-codec (boolean, default: false) - when set to true, extension will be changed according to above rule.

      EDIT: Please take a look at first comment for an update. .gz.avro, .snappy.avro will be an extension of the file when above property will be set to true.

      Attachments

        1. AVRO-1862.patch
          4 kB
          Piotr Wikieł
        2. AVRO-1862-1.patch
          4 kB
          Piotr Wikieł

        Activity

          People

            Unassigned Unassigned
            wikp Piotr Wikieł
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: