Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-33090

CheckpointsCleaner clean individual checkpoint states in parallel

    XMLWordPrintableJSON

Details

    • Hide
      Now when disposing of no longer needed checkpoints, every state handle/state file will be disposed in parallel by the ioExecutor, vastly improving disposing speed of a single checkpoint (for large checkpoints the disposal time can be improved from 10 minutes to < 1 minute) . The old behaviour can be restored by setting `state.checkpoint.cleaner.parallel-mode` to `false`.
      Show
      Now when disposing of no longer needed checkpoints, every state handle/state file will be disposed in parallel by the ioExecutor, vastly improving disposing speed of a single checkpoint (for large checkpoints the disposal time can be improved from 10 minutes to < 1 minute) . The old behaviour can be restored by setting `state.checkpoint.cleaner.parallel-mode` to `false`.

    Description

      Currently CheckpointsCleaner clean multiple checkpoints in parallel with JobManager's ioExecutor, however each checkpoint states is cleaned sequentially. With thousands of StateObjects to clean this can take long time on some checkpoint storage, if longer than the checkpoint interval this prevents new checkpointing.

      The proposal is to use the same ioExecutor to clean up each checkpoints states in parallel as well. From my local testing, with default settings for ioExecutor thread pool for xK state files this can reduce clean up time from 10 minutes to <1 minute. 

      Attachments

        Issue Links

          Activity

            People

              yigress Yi Zhang
              yigress Yi Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: