Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13472

taskmanager.jvm-exit-on-oom doesn't work reliably with YARN

    XMLWordPrintableJSON

Details

    Description

      I have added taskmanager.jvm-exit-on-oom flag to the task manager starting arguments. During my testing (simulating oom) I noticed that sometimes YARN containers were still in RUNNING state even though they should haven been killed on OutOfMemory errors with the flag on.

      I could find RUNNING containers with the last log lines like this. 

      2019-07-26 13:32:51,396 ERROR org.apache.flink.runtime.taskmanager.Task                     - Encountered fatal error java.lang.OutOfMemoryError - terminating the JVM
      java.lang.OutOfMemoryError: Metaspace
      	at java.lang.ClassLoader.defineClass1(Native Method)
      	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
      	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
      	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
      	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
      	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)

       

      Does YARN make it tricky to forcefully kill JVM after OutOfMemory error? 

       

      Workaround

       

      When using -XX:+ExitOnOutOfMemoryError JVM flag containers get always terminated!

      Attachments

        Activity

          People

            Unassigned Unassigned
            pawelbartoszek Pawel Bartoszek
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: