[FLINK-11631] TaskExecutorITCase#testJobReExecutionAfterTaskExecutorTermination unstable on Travis - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Won't Do
Affects Version/s: 1.8.0
Fix Version/s: 1.10.0
Component/s: Runtime / Coordination, Tests
Labels:
- pull-request-available
- test-stability

Description

The TaskExecutorITCase#testJobReExecutionAfterTaskExecutorTermination is unstable on Travis. It fails with

16:12:04.644 [ERROR] testJobReExecutionAfterTaskExecutorTermination(org.apache.flink.runtime.taskexecutor.TaskExecutorITCase)  Time elapsed: 1.257 s  <<< ERROR!
org.apache.flink.util.FlinkException: Could not close resource.
	at org.apache.flink.runtime.taskexecutor.TaskExecutorITCase.teardown(TaskExecutorITCase.java:83)
Caused by: org.apache.flink.util.FlinkException: Error while shutting the TaskExecutor down.
Caused by: org.apache.flink.util.FlinkException: Could not properly shut down the TaskManager services.
Caused by: java.lang.IllegalStateException: NetworkBufferPool is not empty after destroying all LocalBufferPools

https://api.travis-ci.org/v3/job/493221318/log.txt

The problem seems to be caused by the TaskExecutor not properly waiting for the termination of all running Tasks. Due to this, there is a race condition which causes that not all buffers are returned to the BufferPool.

Attachments

Issue Links

is caused by

FLINK-11630 TaskExecutor does not wait for Task termination when terminating itself

Closed

links to

GitHub Pull Request #9147

Activity

People

Assignee:: Biao Liu

Reporter:: Till Rohrmann

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 15/Feb/19 15:54

Updated:: 15/Aug/19 10:33

Resolved:: 15/Aug/19 10:33

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5h