Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.10.1, 1.11.0
Description
The watchdog reports a fatal error using taskManager.notifyFatalError(msg, null). This should normally lead to the TaskManager being terminated. The code introduced in FLINK-16225
tries to look at the passed exception and will eventually fail with a NullPointerException, which prevents the TaskManager from being terminated.
Stacktrace:
2020-05-05 09:43:01,588 ERROR org.apache.flink.runtime.taskmanager.Task - Task did not exit gracefully within 180 + seconds. 2020-05-05 09:43:01,588 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor - Task did not exit gracefully within 180 + seconds. 2020-05-05 09:43:01,588 ERROR org.apache.flink.runtime.taskmanager.Task - Error in Task Cancellation Watch Dog java.lang.NullPointerException at org.apache.flink.util.ExceptionUtils.isOutOfMemoryErrorWithMessageStartingWith(ExceptionUtils.java:186) at org.apache.flink.util.ExceptionUtils.isMetaspaceOutOfMemoryError(ExceptionUtils.java:170) at org.apache.flink.util.ExceptionUtils.enrichTaskManagerOutOfMemoryError(ExceptionUtils.java:144) at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.onFatalError(TaskManagerRunner.java:249) at org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.notifyFatalError(TaskExecutor.java:1751) at org.apache.flink.runtime.taskmanager.Task$TaskCancelerWatchDog.run(Task.java:1514) at java.lang.Thread.run(Thread.java:748)
Attachments
Issue Links
- is caused by
-
FLINK-16225 Metaspace Out Of Memory should be handled as Fatal Error in TaskManager
- Closed
- links to