[FLINK-9080] Flink Scheduler goes OOM, suspecting a memory leak - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 1.4.0
Fix Version/s: None
Component/s: Runtime / Coordination
Labels:
None

Description

Running FLink version 1.4.0. on mesos,scheduler running along with job manager in single container, whereas task managers running in seperate containers.

Couple of jobs were running continously, Flink scheduler was working properlyalong with task managers. Due to some change in data, one of the jobs started failing continuously. In the meantime,there was a surge in flink scheduler memory usually eventually died out off OOM

Memory dump analysis was done,

Following were findings

Majority of top loaded packages retaining heap indicated towards Flinkuserclassloader, glassfish(jersey library), Finalizer classes. (Top level package image)
Top level classes were of Flinkuserclassloader, (Top Level class image)
The number of classes loaded vs unloaded was quite less PFA,inspite of adding jvm options of -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled , PFAclassloaded vs unloaded graph, scheduler was restarted 3 times
There were custom classes as well which were duplicated during subsequent class uploads

PFA all the images of heap dump. Can you suggest some pointers on as to how to overcome this issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Top Level packages.JPG
26/Mar/18 13:15
86 kB
Rohit Singh
Top level classes.JPG
26/Mar/18 13:15
46 kB
Rohit Singh
Screenshot 2018-12-18 at 12.14.11.png
19/Dec/18 15:43
176 kB
Nawaid Shamim
classesloaded vs unloaded.png
26/Mar/18 13:19
48 kB
Rohit Singh
class_loader_leak.png
13/May/19 08:21
93 kB
Michel Davit

Issue Links

relates to

FLINK-11205 Task Manager Metaspace Memory Leak

Closed

FLINK-10317 Configure Metaspace size by default

Closed

Activity

People

Assignee:: Stefan Richter

Reporter:: Rohit Singh

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 26/Mar/18 13:16

Updated:: 29/Jan/21 11:29

Resolved:: 29/Jan/21 11:29