Details
-
Sub-task
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
1.3.0
-
None
Description
Each incremental RocksDB checkpoint n is registering new and existing shared state with the SharedStateRegistry when it completes. Only then, the backend is notified and all following checkpoints (n+x) can reference the new state in checkpoint n.
However, when a checkpoint n+1 is already starting before n was confirmed to the backend, n+1 can assume some files as new, which were already contained in n. It will upload the file to DFS again, creating a new state handle.
Then, once n+1 completes, it could to register some state as new, which was previously registered already by n, without n+1 knowing of this. Currently this violates a precondition check, that the reference count for state that is assumed as new is 1.
While we cannot prevent duplicate uploads, we must resolve this situation in the SharedStateREgistry