[FLINK-28843] Fail to find incremental handle when restoring from changelog checkpoint in claim mode - ASF JIRA

XML

Word

Printable

JSON

When native checkpoint is enabled and incremental checkpointing is enabled in rocksdb statebackend，if state data is greater than state.storage.fs.memory-threshold，it will be stored in a data file (FileStateHandle，RelativeFileStateHandle, etc) rather than stored with ByteStreamStateHandle in checkpoint metadata, like base-path1/chk-1/file1.
Then restore the job from base-path1/chk-1 in claim mode，using changelog statebackend，and the checkpoint path is set to base-path2, then new checkpoint will be saved in base-path2/chk-2, previous checkpoint file (base-path1/chk-1/file1) is needed.
Then restore the job from base-path2/chk-2 in changelog statebackend, flink will try to read base-path2/chk-2/file1, rather than the actual file location base-path1/chk-1/file1, which leads to FileNotFoundException and job failed.

How to reproduce?

Set state.storage.fs.memory-threshold to a small value, like '20b'.
run org.apache.flink.test.checkpointing.ChangelogPeriodicMaterializationSwitchStateBackendITCase#testSwitchFromDisablingToEnablingInClaimMode

relates to

FLINK-25872 Restoring from non-changelog checkpoint with changelog state-backend enabled in CLAIM mode discards state in use

FLINK-28699 Native rocksdb full snapshot in non-incremental checkpointing

links to

GitHub Pull Request #20484