Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.14.4
-
None
-
None
Description
We have a Flink cluster running on k8s/OpenShift in session mode running our Apache Beam pipelines. Some of these pipelines are streaming pipelines and run continuously; some are batch pipelines submitted periodically whenever there is a load to be processed.
We believe that the batch pipelines cause the issue. We submit 1 to several batch jobs every 5 minutes. For each job, a new instance of the ChildFirstClassLoader is instantiated and it looks like they are not closed properly after the job finishes.
Attached is the screenshot from the Eclipse memory analyzer - from the Leak Suspects report. When the heap dump was captured, there were 2 streaming and several batch jobs running plus over 100 finished batch jobs.
In our current setup, we allocate 8GB for the metaspace:
And the top components from the mem analyzer: