Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-26306

[Changelog] Thundering herd problem with materialization

    XMLWordPrintableJSON

Details

    Description

      Quick note: CheckpointCleaner is not involved here.

      When a checkpoint is subsumed, SharedStateRegistry schedules its unused shared state for async deletion. It uses common IO pool for this and adds a Runnable per state handle. ( see SharedStateRegistryImpl.scheduleAsyncDelete)

      When a checkpoint is started, CheckpointCoordinator uses the same thread pool to initialize the location for it. (see CheckpointCoordinator.initializeCheckpoint)

      The thread pool is of fixed size jobmanager.io-pool.size; by default it's the number of CPU cores) and uses FIFO queue for tasks.

      When there is a spike in state deletion, the next checkpoint is delayed waiting for an available IO thread.

      Back-pressure seems reasonable here (similar to CheckpointCleaner); however, this shared state deletion could be spread across multiple subsequent checkpoints, not neccesarily the next one.

      ---- 

      I believe the issue is an pre-existing one; but it particularly affects changelog state backend, because 1) such spikes are likely there; 2) workloads are latency sensitive.

      In the tests, checkpoint duration grows from seconds to minutes immediately after the materialization.

      Attachments

        Issue Links

          Activity

            People

              roman Roman Khachatryan
              roman Roman Khachatryan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: