Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
Description
Currently, SharedStateRegistryImpl will discard old one while register new state to same key:
// Old entry is not in a confirmed checkpoint yet, and the new one differs. // This might result from (omitted KG range here for simplicity): // 1. Flink recovers from a failure using a checkpoint 1 // 2. State Backend is initialized to UID xyz and a set of SST: { 01.sst } // 3. JM triggers checkpoint 2 // 4. TM sends handle: "xyz-002.sst"; JM registers it under "xyz-002.sst" // 5. TM crashes; everything is repeated from (2) // 6. TM recovers from CP 1 again: backend UID "xyz", SST { 01.sst } // 7. JM triggers checkpoint 3 // 8. TM sends NEW state "xyz-002.sst" // 9. JM discards it as duplicate // 10. checkpoint completes, but a wrong SST file is used // So we use a new entry and discard the old one: LOG.info( "Duplicated registration under key {} of a new state: {}. " + "This might happen during the task failover if state backend creates different states with the same key before and after the failure. " + "Discarding the OLD state and keeping the NEW one which is included into a completed checkpoint", registrationKey, newHandle); scheduledStateDeletion = entry.stateHandle; entry.stateHandle = newHandle;
But if execution.checkpointing.max-concurrent-checkpoints > 1, the following case will fail (take RocksDBStateBackend as an example):
- cp1 trigger: 1.sst be uploaded to file-1, and register <1.sst,file-1>, cp1 reference file-1
- cp1 is not yet complete, cp2 trigger: 1.sst be uploaded to file-2, and try register <1.sst,file-2>. SharedStateRegistry discard file-1
- cp1 completed and cp2 failed, but the cp1 is broken (file-1 has be deleted)
I add a test to reproduce the problem ( pr-22606 ).
I think we should allow register multi state object to same key, WDYT pnowojski, roman ?
Attachments
Issue Links
- duplicates
-
FLINK-29913 Shared state would be discarded by mistake when maxConcurrentCheckpoint>1
- Closed
- links to