Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
When writing taskmanager failover tests with unified testing framework for connectors, I find that it may stuck in
CommonTestUtils.waitForJobStatus() as following:
- triggerTaskManagerFailover is called.
- JobStatus switched from RUNNING to RESTARTING.
- JobStatus switched from RESTARTING to RUNNING.
- The method terminateTaskManager() is completed.
- Since the jobStatus is RUNNING, CommonTestUtils.waitForJobStatus() will never exit.
A solution is to call terminateTaskManager() with async way. At the same time, call
CommonTestUtils.waitForJobStatus(). The pseudo code can be as follow:
public void triggerTaskManagerFailover(JobClient jobClient, Runnable afterFailAction) throws Exception { CompletableFuture<Void> completableFuture = terminateTaskManager(); CommonTestUtils.waitForJobStatus( jobClient, Arrays.asList(JobStatus.FAILING, JobStatus.FAILED, JobStatus.RESTARTING), Deadline.fromNow(Duration.ofMinutes(5))); completableFuture.get(); afterFailAction.run(); startTaskManager(); }
Attachments
Issue Links
- duplicates
-
FLINK-23807 Use RestClient to detect restarts in MiniClusterTestEnvironment#triggerTaskManagerFailover
- Resolved