Details
-
New Feature
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
Description
This is a sub-FLIP for the disaggregated state management and its related work, please read the FLIP-423 first to know the whole story.
As described in FLIP-423, there are some tough issues about embedded state backend on local file system, respecially when dealing with extremely large state:
- Constraints of local disk space complicate the prediction of storage requirements, potentially leading to job failures: Especially in cloud native deployment mode, pre-allocated local disks typically face strict capacity constraints, making it challenging to forecast the size requirements of job states. Over-provisioning disk space results in unnecessary resource overhead, while under-provisioning risks job failure due to insufficient space.
- The tight coupling of compute and storage resources leads to underutilization and increased waste: Jobs can generally be categorized as either CPU-intensive or IO-intensive. In a coupled architecture, CPU-intensive jobs leave a significant portion of storage resources underutilized, whereas IO-intensive jobs result in idle computing resources.
By considering remote storage as the primary storage, all working states are maintained on the remote file system, which brings several advantages:
- Remote storages e.g. S3/HDFS typically offer elastic scalability, theoretically providing unlimited space.
- The allocation of remote storage resources can be optimized by reducing them for CPU-intensive jobs and augmenting them for IO-intensive jobs, thus enhancing overall resource utilization.
- This architecture facilitates a highly efficient and lightweight process for checkpointing, recovery, and rescaling through fast copy or simple move.
This FLIP aims to realize disaggregated state for our new key-value store named ForSt which evloves from RocksDB and supports remote file system. This makes Flink get rid of the disadvantages by coupled state architecture and embrace the scalable as well as flexible cloud-native storage.
Please see FLIP-427 for more details.
Attachments
Issue Links
- is a child of
-
FLINK-34984 FLIP-423: Disaggregated State Storage and Management (Umbrella FLIP)
- Open
- links to