Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10855

CheckpointCoordinator does not delete checkpoint directory of late/failed checkpoints

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.5.5, 1.6.2, 1.7.0
    • Fix Version/s: None
    • Labels:
      None

      Description

      In case that an acknowledge checkpoint message is late or a checkpoint cannot be acknowledged, we discard the subtask state in the CheckpointCoordinator. What's not happening in this case is that we delete the parent directory of the checkpoint. This only happens when we dispose a PendingCheckpoint#dispose.

      Due to this behaviour it can happen that a checkpoint fails (e.g. a task not being ready) and we delete the checkpoint directory. Next another task writes its checkpoint data to the checkpoint directory (thereby creating it again) and sending an acknowledge message back to the CheckpointCoordinator. The CheckpointCoordinator will realize that there is no longer a PendingCheckpoint and will discard the sub task state. This will remove the state files from the checkpoint directory but will leave the checkpoint directory untouched.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yanghua vinoyang
                Reporter:
                trohrmann Till Rohrmann
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated: