Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Won't Do
-
1.8.0
Description
The TaskExecutorITCase#testJobReExecutionAfterTaskExecutorTermination is unstable on Travis. It fails with
16:12:04.644 [ERROR] testJobReExecutionAfterTaskExecutorTermination(org.apache.flink.runtime.taskexecutor.TaskExecutorITCase) Time elapsed: 1.257 s <<< ERROR!
org.apache.flink.util.FlinkException: Could not close resource.
at org.apache.flink.runtime.taskexecutor.TaskExecutorITCase.teardown(TaskExecutorITCase.java:83)
Caused by: org.apache.flink.util.FlinkException: Error while shutting the TaskExecutor down.
Caused by: org.apache.flink.util.FlinkException: Could not properly shut down the TaskManager services.
Caused by: java.lang.IllegalStateException: NetworkBufferPool is not empty after destroying all LocalBufferPools
https://api.travis-ci.org/v3/job/493221318/log.txt
The problem seems to be caused by the TaskExecutor not properly waiting for the termination of all running Tasks. Due to this, there is a race condition which causes that not all buffers are returned to the BufferPool.
Attachments
Issue Links
- is caused by
-
FLINK-11630 TaskExecutor does not wait for Task termination when terminating itself
- Closed
- links to