[HDDS-4599] Handle inflight delete/add actions in ReplicationManager properly. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.1.0
Fix Version/s: None
Component/s: SCM HA
Labels:
None

Description

ReplicationManager maintains the in-flight replication and deletion in-memory, which is not replicated using Ratis. So, theoretically it’s possible that we might run into data loss issues and over replicated issues if we immediately start ReplicationManager after a failover.

There is a quick fix for the potential data loss issue ~~HDDS-4589~~, however we need a thorough solution for both in-flight add and in-flight delete.

We have two proposals from sodonnell:

have the DNs provide a list of pending_delete blocks in their container report / heartbeat, and then we can use that in SCM.
if the DNs detect a new master SCM or a restarted SCM, then purge their pending delete list and wait for new instructions from the new/restarted SCM.

File this Jira to record this problem.

Attachments

Activity

People

Assignee:: YI-CHEN WANG

Reporter:: Glen Geng

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 16/Dec/20 06:52

Updated:: 13/Feb/24 07:59