Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14339

The checkpoint ID count wrong on restore savepoint log

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Not a Priority
    • Resolution: Unresolved
    • 1.8.0
    • None
    • None

    Description

      I saw the below log when I tested Flink restore from the savepoint.

      [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Recovering checkpoints from ZooKeeper.
      [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Found 0 checkpoints in ZooKeeper.
      [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying to fetch 0 checkpoints from storage.
      [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Starting job 00000000000000000000000000000000 from savepoint /nfsdata/ecs/flink-savepoints/flink-savepoint-test/00000000000000000000000000000000/201910080158/savepoint-000000-003c9b080832 (allowing non restored state)
      [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Reset the checkpoint ID of job 00000000000000000000000000000000 to 12285.
      [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Recovering checkpoints from ZooKeeper.
      [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Found 1 checkpoints in ZooKeeper.
      [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying to fetch 1 checkpoints from storage.
      [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying to retrieve checkpoint 12284.
      [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Restoring job 00000000000000000000000000000000 from latest valid checkpoint: Checkpoint 12284 @ 0 for 00000000000000000000000000000000.
      

      You can find the final resotre checkpoint ID is 12284, but we can see the log print "Reset the checkpoint ID of job 00000000000000000000000000000000 to 12285". So, I checked the source code.

      // Reset the checkpoint ID counter
      long nextCheckpointId = savepoint.getCheckpointID() + 1;
      checkpointIdCounter.setCount(nextCheckpointId);
      
      LOG.info("Reset the checkpoint ID of job {} to {}.", job, nextCheckpointId);
      

      I think they should print a checkpoint ID instead of the next checkpoint ID.

      LOG.info("Reset the checkpoint ID of job {} to {}.", job, savepoint.getCheckpointID());
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            Kuncle king's uncle
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: