Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-24621

JobManager fails to recover 1.13.1 checkpoint due to InflightDataRescalingDescriptor

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 1.13.2, 1.13.3
    • 1.13.6
    • None

    Description

      A user reporter on the mailing list of a JM that is unable to read a 1.13.1 checkpoint.
      https://lists.apache.org/thread/wnxfpfhr5gkmovjctf7bdf8xmf7qmwlb

      The big question is why the InflightDataRescalingDescriptor is creating problems, because it should not actually be contained in a checkpoint.

      Caused by: org.apache.flink.util.FlinkException: Could not retrieve checkpoint 2844 from state handle under checkpointID-0000000000000002844. This indicates that the retrieved state handle is broken. Try cleaning the state handle store.
      at org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore.retrieveCompletedCheckpoint(DefaultCompletedCheckpointStore.java:309) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore.recover(DefaultCompletedCheckpointStore.java:151) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreLatestCheckpointedStateInternal(CheckpointCoordinator.java:1513) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreInitialCheckpointIfPresent(CheckpointCoordinator.java:1476) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:134) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:342) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) ~[flink-dist_2.12-1.13.3.jar:1.13.3]
      at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) ~[?:?]
      at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
      at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
      at java.lang.Thread.run(Unknown Source) ~[?:?]
      Caused by: java.io.InvalidClassException: org.apache.flink.runtime.checkpoint.InflightDataRescalingDescriptor$NoRescalingDescriptor; local class incompatible: stream classdesc serialVersionUID = -5544173933105855751, local class serialVersionUID = 1
      at java.io.ObjectStreamClass.initNonProxy(Unknown Source) ~[?:?]
      at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source) ~[?:?]
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dwysakowicz Dawid Wysakowicz
            chesnay Chesnay Schepler
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment