[FLINK-6533] Duplicated registration of new shared state when checkpoint confirmations are still pending - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.3.0
Fix Version/s: 1.3.0
Component/s: Runtime / State Backends
Labels:
None

Description

Each incremental RocksDB checkpoint n is registering new and existing shared state with the SharedStateRegistry when it completes. Only then, the backend is notified and all following checkpoints (n+x) can reference the new state in checkpoint n.

However, when a checkpoint n+1 is already starting before n was confirmed to the backend, n+1 can assume some files as new, which were already contained in n. It will upload the file to DFS again, creating a new state handle.

Then, once n+1 completes, it could to register some state as new, which was previously registered already by n, without n+1 knowing of this. Currently this violates a precondition check, that the reference count for state that is assumed as new is 1.

While we cannot prevent duplicate uploads, we must resolve this situation in the SharedStateREgistry

Attachments

Activity

People

Assignee:: Stefan Richter

Reporter:: Stefan Richter

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 11/May/17 08:39

Updated:: 14/May/17 12:25

Resolved:: 14/May/17 12:25