[FLINK-8622] flink-mesos: High memory usage of scheduler + job manager. GC never kicks in. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Abandoned
Affects Version/s: 1.3.2, 1.4.0
Fix Version/s: None
Component/s: Deployment / Mesos, Runtime / Coordination
Labels:
None

Description

We are deploying a 1 job manager + 6 taskmanager flink cluster on mesos.

We have observed that the memory usage for 'jobmanager' is high. In spite of allocating more and more memory resources to it, it hits the limit within minutes.

We had started with 1.5 GB RAM and 1 GB heap. Currently we have allocated 4 GB RAM, 3 GB heap to jobmanager cum scheduler. We tried allocating 8GB RAM and lesser heap (i.e. same, 3GB) too. In that case also, memory graph was identical.

As per the graph below, the scheduler almost always runs with maximum memory resources.

Throughout the run of the scheduler, we do not see memory usage going down unless it is killed due to OOM. So inferring, garbage collection is never happening.

We have tried using both flink versions 1.4 and 1.3 but could see same issue on both versions.

Is there any way we can find out where and how memory is being used?

Are there any flink config options for jobmanager or jvm parameters which can help us restrict the memory usage, force garbage collection, and prevent it from crash?

Please let us know if there any resource recommendations from Flink for running Flink on mesos at scale.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

flink-mem-usage-graph-for-jira.png
09/Feb/18 12:53
39 kB
Bhumika Bayani

Activity

People

Assignee:: Unassigned

Reporter:: Bhumika Bayani

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/Feb/18 10:05

Updated:: 31/Jan/20 11:11

Resolved:: 31/Jan/20 11:11