Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-7196

CompressionCodecFactory returns unconfigured GZipCodec if io.compression.codecs is not set

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.20.2
    • None
    • None
    • None

    Description

      In case io.compression.codecs property is not set the GZipCodec is added using this code:

      List<Class<? extends CompressionCodec>> codecClasses = getCodecClasses(conf);
      if (codecClasses == null) {
        addCodec(new GzipCodec());
        addCodec(new DefaultCodec());      
      } else {
        Iterator<Class<? extends CompressionCodec>> itr = codecClasses.iterator();
        while (itr.hasNext()) {
          CompressionCodec codec = ReflectionUtils.newInstance(itr.next(), conf);
          addCodec(codec);     
        }
      }
      

      which leaves GzipCodec unconfigured. If it is set via the io.compression.codecs property it gets configured properly using ReflectionUtils.newInstance(..., conf).

      I have seen a lot of NPEs on systems that don't have this property set when using a LineRecordReader (that internally gets the codec from CompressionCodecFactory).

      I would suggest to use org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec as default value for io.compression.codecs, instead of having another independent code path that deals with the case that this property is not set.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pvoss Peter Voss
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: