Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7759 Improve Ozone Replication Manager
  3. HDDS-8471

Ensure replication processors use a single queue for each iteration

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • None
    • 1.4.0
    • None

    Description

      The under and over replication queues in ReplicationManager are created when replicationManager checks the health of all containers in the system. When it does that, it forms a new "ReplicationQueue" object wrapping the under and over replicated queues.

      The OverReplicatedProcessor and UnderReplicatedProcessor both extend UnhealthyReplicationProcessor. Within it, it dequeues messages and processes them. If there is an exception, it saves the message in a list, ready to enqueue it again later. It saves the message, rather than enqueuing it immediately, to avoid the queue entering an infinite loop when a container fails repeatedly.

      The issue is that while the Under / Over process is running, it could be saving up containers to requeue, but then ReplicationManager could process all the containers and replace the queue. Then the bad containers are requeued onto the "new" queue, possibly creating duplicates.

      While the duplicates should not cause any problem, it would be better if this was handled more gracefully.

      For example, if the queue has been replaced, drop the failed containers - but how to check if the queue has been replaced?

      Attachments

        Issue Links

          Activity

            People

              adoroszlai Attila Doroszlai
              sodonnell Stephen O'Donnell
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: