Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
-
None
Description
I recently found a bug in the gpl compression libraries that was causing map tasks for a particular job to OOM.
https://github.com/omalley/hadoop-gpl-compression/issues/3
Now granted it does not make a lot of sense for a job to use the LzopCodec for map output compression over the LzoCodec, but arguably other codecs could be doing similar things and causing the same sort of memory leaks. I propose that we do a sanity check when creating a new decompressor/compressor. If the codec newly created object does not match the value from getType... it should turn off caching for that Codec.