Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-8622

flink-mesos: High memory usage of scheduler + job manager. GC never kicks in.

    XMLWordPrintableJSON

Details

    Description

      We are deploying a 1 job manager + 6 taskmanager flink cluster on mesos.

      We have observed that the memory usage for 'jobmanager' is high. In spite of allocating more and more memory resources to it, it hits the limit within minutes.

      We had started with 1.5 GB RAM and 1 GB heap. Currently we have allocated 4 GB RAM, 3 GB heap to jobmanager cum scheduler. We tried allocating 8GB RAM and lesser heap (i.e. same, 3GB) too. In that case also, memory graph was identical.

      As per the graph below, the scheduler almost always runs with maximum memory resources.

       

      Throughout the run of the scheduler, we do not see memory usage going down unless it is killed due to OOM. So inferring, garbage collection is never happening.

      We have tried using both flink versions 1.4 and 1.3 but could see same issue on both versions.

       

      Is there any way we can find out where and how memory is being used? 

      Are there any flink config options for jobmanager or jvm parameters which can help us restrict the memory usage, force garbage collection, and prevent it from crash? 

      Please let us know if there any resource recommendations from Flink for running Flink on mesos at scale.

       

      Attachments

        1. flink-mem-usage-graph-for-jira.png
          39 kB
          Bhumika Bayani

        Activity

          People

            Unassigned Unassigned
            bbayani Bhumika Bayani
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: