Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.20.2
-
None
-
None
-
None
Description
In case io.compression.codecs property is not set the GZipCodec is added using this code:
List<Class<? extends CompressionCodec>> codecClasses = getCodecClasses(conf); if (codecClasses == null) { addCodec(new GzipCodec()); addCodec(new DefaultCodec()); } else { Iterator<Class<? extends CompressionCodec>> itr = codecClasses.iterator(); while (itr.hasNext()) { CompressionCodec codec = ReflectionUtils.newInstance(itr.next(), conf); addCodec(codec); } }
which leaves GzipCodec unconfigured. If it is set via the io.compression.codecs property it gets configured properly using ReflectionUtils.newInstance(..., conf).
I have seen a lot of NPEs on systems that don't have this property set when using a LineRecordReader (that internally gets the codec from CompressionCodecFactory).
I would suggest to use org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec as default value for io.compression.codecs, instead of having another independent code path that deals with the case that this property is not set.