Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-28248

Metaspace memory is leaking when repeatedly submitting Beam batch pipelines via the REST API

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.14.4
    • None
    • API / Core
    • None

    Description

      We have a Flink cluster running on k8s/OpenShift in session mode running our Apache Beam pipelines. Some of these pipelines are streaming pipelines and run continuously; some are batch pipelines submitted periodically whenever there is a load to be processed.

      We believe that the batch pipelines cause the issue. We submit 1 to several batch jobs every 5 minutes. For each job, a new instance of the ChildFirstClassLoader is instantiated and it looks like they are not closed properly after the job finishes.

      Attached is the screenshot from the Eclipse memory analyzer - from the Leak Suspects report. When the heap dump was captured, there were 2 streaming and several batch jobs running plus over 100 finished batch jobs.

      In our current setup, we allocate 8GB for the metaspace:

       

      And the top components from the mem analyzer:

      Attachments

        1. image-2022-06-24-14-45-51-689.png
          131 kB
          Arkadiusz Gasinski
        2. image-2022-06-24-14-51-47-909.png
          169 kB
          Arkadiusz Gasinski
        3. image-2022-06-24-15-07-43-035.png
          476 kB
          Arkadiusz Gasinski
        4. image-2022-07-05-15-47-45-038.png
          120 kB
          Arkadiusz Gasinski
        5. image-2022-07-05-15-51-05-840.png
          133 kB
          Arkadiusz Gasinski
        6. image-2022-07-05-15-58-43-448.png
          137 kB
          Arkadiusz Gasinski

        Activity

          People

            Unassigned Unassigned
            jigga Arkadiusz Gasinski
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: