[HDDS-8390] Synchronization between Snapshot Deletes/GC and other Snapshot jobs (read/diff) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: Snapshot
Labels:
- pull-request-available

Description

We need to have proper synchronization between Snapshot delete/GC and other Snapshot jobs e.g. reads from Snapshots and Snapdiff. Snapdiff is particularly important case since it could be a long running job and in the middle of the job, Snapshot delete/GC can kick in.

We should also have a uniform behavior in the cluster in case of a failover and concurrent Snap-diff/Deletes. It should not happen that a leader OM node returns certain result to a client but after a failover the new OM leader returns different result.

—

Thus, in order to prevent client from getting partial SnapDiff result without the client even realizing it, and to avoid explicitly holding lock, we would want to use an approach similar to optimistic locking, by checking whether the snapshot is still ACTIVE towards the end of the request lifetime when SnapDiff service has already collected all the batch entires in a buffer. See the attachment for a timeline of potential race condition: 35fdc3bd-cd0c-40f3-8fd7-2d8a8dc4643d.pdf

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

35fdc3bd-cd0c-40f3-8fd7-2d8a8dc4643d.pdf
14/Apr/23 15:27
31 kB
Siyao Meng

Issue Links

contains

HDDS-8197 [Snapshot] Refactor SnapshotDiffManager#getDeltaFiles

Resolved

links to

GitHub Pull Request #4617

Activity

People

Assignee:: Hemant Kumar

Reporter:: Prashant Pogde

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 05/Apr/23 23:51

Updated:: 22/Jun/23 21:26

Resolved:: 08/May/23 21:03