[HDDS-8471] Ensure replication processors use a single queue for each iteration - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Implemented
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: None
Labels:
- pull-request-available

Description

The under and over replication queues in ReplicationManager are created when replicationManager checks the health of all containers in the system. When it does that, it forms a new "ReplicationQueue" object wrapping the under and over replicated queues.

The OverReplicatedProcessor and UnderReplicatedProcessor both extend UnhealthyReplicationProcessor. Within it, it dequeues messages and processes them. If there is an exception, it saves the message in a list, ready to enqueue it again later. It saves the message, rather than enqueuing it immediately, to avoid the queue entering an infinite loop when a container fails repeatedly.

The issue is that while the Under / Over process is running, it could be saving up containers to requeue, but then ReplicationManager could process all the containers and replace the queue. Then the bad containers are requeued onto the "new" queue, possibly creating duplicates.

While the duplicates should not cause any problem, it would be better if this was handled more gracefully.

For example, if the queue has been replaced, drop the failed containers - but how to check if the queue has been replaced?

Attachments

Issue Links

links to

GitHub Pull Request #4627

Activity

People

Assignee:: Attila Doroszlai

Reporter:: Stephen O'Donnell

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Apr/23 10:32

Updated:: 28/Apr/23 14:30

Resolved:: 28/Apr/23 14:30