Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-17514

TaskCancelerWatchdog does not kill TaskManager

    XMLWordPrintableJSON

    Details

      Description

      The watchdog reports a fatal error using taskManager.notifyFatalError(msg, null). This should normally lead to the TaskManager being terminated. The code introduced in FLINK-16225
      tries to look at the passed exception and will eventually fail with a NullPointerException, which prevents the TaskManager from being terminated.

      Stacktrace:

      2020-05-05 09:43:01,588 ERROR org.apache.flink.runtime.taskmanager.Task                     - Task did not exit gracefully within 180 + seconds.
      2020-05-05 09:43:01,588 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor            - Task did not exit gracefully within 180 + seconds.
      2020-05-05 09:43:01,588 ERROR org.apache.flink.runtime.taskmanager.Task                     - Error in Task Cancellation Watch Dog
      java.lang.NullPointerException
      	at org.apache.flink.util.ExceptionUtils.isOutOfMemoryErrorWithMessageStartingWith(ExceptionUtils.java:186)
      	at org.apache.flink.util.ExceptionUtils.isMetaspaceOutOfMemoryError(ExceptionUtils.java:170)
      	at org.apache.flink.util.ExceptionUtils.enrichTaskManagerOutOfMemoryError(ExceptionUtils.java:144)
      	at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.onFatalError(TaskManagerRunner.java:249)
      	at org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.notifyFatalError(TaskExecutor.java:1751)
      	at org.apache.flink.runtime.taskmanager.Task$TaskCancelerWatchDog.run(Task.java:1514)
      	at java.lang.Thread.run(Thread.java:748)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                trohrmann Till Rohrmann
                Reporter:
                aljoscha Aljoscha Krettek
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: