Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.3.4
-
None
-
None
-
- Amazon Hadoop Distribution emr-5.20.0
- Master mode with 4 CPU and 16 GB RAM
- Table files stored in S3 cloud storage
Description
We are continuously loading data into Hive table backed by files in ORC format by appending data in batches. We repeatedly have seen that over a span of few days Hive server experiences OutOfMemoryError exceptions that we believe are caused by memory leaks.
Comparison of heap dumps shows that most suspicious classes that show persistent growth and not recycled with GC are
- org.apache.hadoop.hive.ql.io.orc.OrcStruct$Field
- org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField
- String
Sample program used for stress test and heap dumps from 700 to 2500 GB can be uploaded on request. They are too big for Jira backing store