Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
We need to have proper synchronization between Snapshot delete/GC and other Snapshot jobs e.g. reads from Snapshots and Snapdiff. Snapdiff is particularly important case since it could be a long running job and in the middle of the job, Snapshot delete/GC can kick in.
We should also have a uniform behavior in the cluster in case of a failover and concurrent Snap-diff/Deletes. It should not happen that a leader OM node returns certain result to a client but after a failover the new OM leader returns different result.
—
Thus, in order to prevent client from getting partial SnapDiff result without the client even realizing it, and to avoid explicitly holding lock, we would want to use an approach similar to optimistic locking, by checking whether the snapshot is still ACTIVE towards the end of the request lifetime when SnapDiff service has already collected all the batch entires in a buffer. See the attachment for a timeline of potential race condition: 35fdc3bd-cd0c-40f3-8fd7-2d8a8dc4643d.pdf
Attachments
Attachments
Issue Links
- contains
-
HDDS-8197 [Snapshot] Refactor SnapshotDiffManager#getDeltaFiles
- Resolved
- links to