[HIVE-11414] Fix OOM in MapTask with many input partitions with RCFile - ASF JIRA

XML

Word

Printable

JSON

MapTask hit OOM in the following situation in our production environment:

By analyzing the heap dump using jhat, we realized that the problem is:

One single mapper is processing many partitions (because of CombineHiveInputFormat)
Each input path (equivalent to partition here) will construct its own SerDe
Each SerDe will do its own caching of deserialized object (and try to reuse it), but will never release it (in this case, the serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a lot of space - pretty much the last N rows of a file where N is the number of rows in a columnar block).
This problem may exist in other SerDe as well, but columnar file format are affected the most because they need bigger cache for the last N rows instead of 1 row.

Proposed solution:

Remove cachedLazyStruct in serde2.columnar.ColumnarSerDeBase. The cost saving of not recreating a single object is too small compared to processing N rows.

Alternative solutions:

We can also free up the whole SerDe after processing a block/file. The problem with that is that the input splits may contain multiple blocks/files that maps to the same SerDe, and recreating a SerDe is a much bigger change to the code.
We can also move the SerDe creation/free-up to the place when input file changes. But that requires a much bigger change to the code.
We can also add a "cleanup()" method to SerDe interface that release the cached object, but that change is not backward compatible with many SerDes that people have wrote.
We can make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly referenced object, but that feels like an overkill.