Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5086

Clean dead snapshot files produced by the tasks failing to acknowledge checkpoints

    XMLWordPrintableJSON

Details

    Description

      A task may fail when performing checkpoints. In that case, the task may have already copied some data to external storage. But since the task fails to send the state handler to CheckpointCoordinator, the copied data will not be deleted by CheckpointCoordinator.

      I think we must find a method to clean such dead snapshot data to avoid unlimited usage of external storage.

      One possible method is to clean these dead files when the task recovers. When a task recovers, CheckpointCoordinator will tell the task all the retained checkpoints. The task then can scan the external storage to delete all the snapshots not in these retained checkpoints.

      Attachments

        Activity

          People

            Unassigned Unassigned
            shixg Xiaogang Shi
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: