Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Implemented
-
None
-
None
Description
The under and over replication queues in ReplicationManager are created when replicationManager checks the health of all containers in the system. When it does that, it forms a new "ReplicationQueue" object wrapping the under and over replicated queues.
The OverReplicatedProcessor and UnderReplicatedProcessor both extend UnhealthyReplicationProcessor. Within it, it dequeues messages and processes them. If there is an exception, it saves the message in a list, ready to enqueue it again later. It saves the message, rather than enqueuing it immediately, to avoid the queue entering an infinite loop when a container fails repeatedly.
The issue is that while the Under / Over process is running, it could be saving up containers to requeue, but then ReplicationManager could process all the containers and replace the queue. Then the bad containers are requeued onto the "new" queue, possibly creating duplicates.
While the duplicates should not cause any problem, it would be better if this was handled more gracefully.
For example, if the queue has been replaced, drop the failed containers - but how to check if the queue has been replaced?
Attachments
Issue Links
- links to