Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-24161

Can not stop the job with savepoint while a task is finishing

    XMLWordPrintableJSON

Details

    Description

      When stop the job with savepoint, if there is a task is finishing, the action will be timeout.

      Testing job: https://github.com/KarmaGYZ/flink/blob/test-147/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java

      Flink conf:

      state.savepoints.dir: file:///tmp/flink-savepoints
      state.backend: rocksdb
      state.backend.incremental: true
      state.checkpoints.dir: file:///tmp/flink-ckp/
      execution.checkpointing.aligned-checkpoint-timeout: 30 s
      execution.checkpointing.interval: 5 s
      taskmanager.numberOfTaskSlots: 2
      execution.checkpointing.checkpoints-after-tasks-finish.enabled: true
      

      How to reproduce:

      bin/flink run -d -p 4 examples/streaming/WordCount.jar
      # while one task is finishing
      bin/flink stop $JOB_ID
      

      Client log:

      ------------------------------------------------------------
       The program finished with the following exception:
      
      org.apache.flink.util.FlinkException: Could not stop with a savepoint job "e139a2eba7f8dc0b07fab65e84421ee4".
        at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:581)
        at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
        at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:569)
        at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1069)
        at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
        at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
      Caused by: java.util.concurrent.TimeoutException
        at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
        at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:579)
        ... 6 more
      

      Attachments

        Issue Links

          Activity

            People

              dwysakowicz Dawid Wysakowicz
              guoyangze Yangze Guo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: