Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-20333

Flink standalone cluster throws metaspace OOM after submitting multiple PyFlink UDF jobs.

    XMLWordPrintableJSON

    Details

      Description

      Currently the Flink standalone cluster will throw metaspace OOM after submitting multiple PyFlink UDF jobs. The root cause is that currently the PyFlink classes are running in user classloader and so each job creates a separate user class loader to load PyFlink related classes. There are many soft references and Finalizers in memory (introduced by the underlying Netty), which prevents the garbage collection of the user classloader of already finished PyFlink jobs. 

      Due to their existence, it needs multiple full gc to reclaim the classloader of the completed job. If only one full gc is performed before the metaspace space is insufficient, then OOM will occur.

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zhongwei Wei Zhong
                Reporter:
                zhongwei Wei Zhong
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: