Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-36118

FLIP-455: Declare async state processing and checkpoint the in-flight requests

    XMLWordPrintableJSON

Details

    Description

      The FLIP: https://cwiki.apache.org/confluence/x/C4owEg

      FLIP-423 introduced the disaggregated state management and the FLIP-425 introduced the new execution model of asynchronous state access in an event-driven way. This model has the potential to significantly boost performance by leveraging parallel I/O operations. However, it does lead to increased draining times during checkpoints, presenting a trade-off between system throughput and checkpoint synchronization delay. This balance can be calibrated through adjusting the buffer size. As a follow-up FLIP for FLIP-425, this FLIP proposes a faster way of checkpoint by snapshot state requests that are waiting in the buffer of "Asynchronous Execution Controller (AEC)" as part of the checkpoint. By this approach, we expect only a great optimization for the draining time overhead compared with the original plan in FLIP-425, especially under a high back-pressure scenario. To achieve the snapshot of state requests, the callbacks from user should be persisted across job attempts. This FLIP introduces a novel approach for declaring element processing where all callbacks are re-declared and bound to the corresponding previous state requests during the operator's initialization phase. This ensures that the entire pipeline can be accurately restored and operations can resume smoothly after a job restart.

      Attachments

        Issue Links

          Activity

            People

              zakelly Zakelly Lan
              zakelly Zakelly Lan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: