Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The FLIP: https://cwiki.apache.org/confluence/x/C4owEg
FLIP-423 introduced the disaggregated state management and the FLIP-425 introduced the new execution model of asynchronous state access in an event-driven way. This model has the potential to significantly boost performance by leveraging parallel I/O operations. However, it does lead to increased draining times during checkpoints, presenting a trade-off between system throughput and checkpoint synchronization delay. This balance can be calibrated through adjusting the buffer size. As a follow-up FLIP for FLIP-425, this FLIP proposes a faster way of checkpoint by snapshot state requests that are waiting in the buffer of "Asynchronous Execution Controller (AEC)" as part of the checkpoint. By this approach, we expect only a great optimization for the draining time overhead compared with the original plan in FLIP-425, especially under a high back-pressure scenario. To achieve the snapshot of state requests, the callbacks from user should be persisted across job attempts. This FLIP introduces a novel approach for declaring element processing where all callbacks are re-declared and bound to the corresponding previous state requests during the operator's initialization phase. This ensures that the entire pipeline can be accurately restored and operations can resume smoothly after a job restart.
Attachments
Issue Links
- mentioned in
-
Page Loading...