Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-20192

Externalized checkpoint references a checkpoint from a different job

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      When I try to restore from an externalized checkpoint located at: /home/anttkaik/flink/checkpoints/0fc94de8d94e123585b5baed6972dbe8/chk-12 I get the following error:
       

      java.lang.Exception: Exception while creating StreamOperatorStateContext.     at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:204)     at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:247)     at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290)     at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:479)     at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)     at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:475)     at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:528)     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)     at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for FunctionGroupOperator_6b87a4870d0e21cecbbe271bd893cfcc_(2/4) from any of the 1 provided restore options.     at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)     at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)     at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)     ... 9 more Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected exception.     at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:329)     at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:535)     at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)     at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)     at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)     ... 11 more Caused by: java.io.FileNotFoundException: /home/anttkaik/flink/checkpoints/01dbaf21d7c5e8f8eabd3602e086bb89/shared/0a3c0c1d-c924-4e6d-b6ad-463a75c9fce8 (No such file or directory)     at java.io.FileInputStream.open0(Native Method)     at java.io.FileInputStream.open(FileInputStream.java:195)     at java.io.FileInputStream.<init>(FileInputStream.java:138)     at org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50)     at org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:143)     at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:85)     at org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:69)     at org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForStateHandle(RocksDBStateDownloader.java:126)     at org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.lambda$createDownloadRunnables$0(RocksDBStateDownloader.java:109)     at org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:50)     at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)     at org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:211)     at java.util.concurrent.CompletableFuture.asyncRunStage(CompletableFuture.java:1654)     at java.util.concurrent.CompletableFuture.runAsync(CompletableFuture.java:1871)     at org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForAllStateHandles(RocksDBStateDownloader.java:83)     at org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.transferAllStateDataToDirectory(RocksDBStateDownloader.java:66)     at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.transferRemoteStateToLocalDirectory(RocksDBIncrementalRestoreOperation.java:230)     at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:195)     at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.initDBWithRescaling(RocksDBIncrementalRestoreOperation.java:342)     at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithRescaling(RocksDBIncrementalRestoreOperation.java:276)     at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:153)     at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270)     ... 15 more

      The job 0fc94de8d94e123585b5baed6972dbe8 was restored from an externalized checkpoint generated by 01dbaf21d7c5e8f8eabd3602e086bb89 and after the restoration was successful and 0fc94de8d94e123585b5baed6972dbe8 had generated new externalized checkpoints I thought it was safe to delete the checkpoints from 01dbaf21d7c5e8f8eabd3602e086bb89 but apparently I was wrong.

      I have attached the _metadata file from /home/anttkaik/flink/checkpoints/0fc94de8d94e123585b5baed6972dbe8/chk-12 which contains the reference to /home/anttkaik/flink/checkpoints/01dbaf21d7c5e8f8eabd3602e086bb89/shared/0a3c0c1d-c924-4e6d-b6ad-463a75c9fce8 which I have deleted.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            Antti-Kaikkonen Antti Kaikkonen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment