Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-24174

MiniClusterTestEnvironment‘s triggerTaskManagerFailover may stuck in CommonTestUtils.waitForJobStatus()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • Test Infrastructure
    • None

    Description

      When writing taskmanager failover tests with unified testing framework for connectors, I find that it may stuck in 

      CommonTestUtils.waitForJobStatus() as following:

      1. triggerTaskManagerFailover is called.
      2. JobStatus switched from RUNNING to RESTARTING.
      3. JobStatus switched from RESTARTING to RUNNING.
      4. The method terminateTaskManager() is completed.
      5. Since the jobStatus is RUNNING, CommonTestUtils.waitForJobStatus() will never exit.

      A solution is to call terminateTaskManager() with async way. At the same time, call 

      CommonTestUtils.waitForJobStatus(). The pseudo code can be as follow:

      public void triggerTaskManagerFailover(JobClient jobClient, Runnable afterFailAction)
              throws Exception {
          CompletableFuture<Void> completableFuture = terminateTaskManager();
          CommonTestUtils.waitForJobStatus(
                  jobClient,
                  Arrays.asList(JobStatus.FAILING, JobStatus.FAILED, JobStatus.RESTARTING),
                  Deadline.fromNow(Duration.ofMinutes(5)));
          completableFuture.get();
          afterFailAction.run();
          startTaskManager();
      }
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Jiangang Liu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: