Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-8699 Further Replication Manager Improvements
  3. HDDS-9258

LegacyReplicationManager: Pending deletes on unhealthy replicas can cause calculation errors

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • SCM
    • None

    Description

      In RatisContainerReplicaCount, should we discount any pending deletes for replicas that LRM sees as unhealthy? Since we ignore UNHEALTHY containers, it makes sense to not count their pending deletes.

      Suppose there's a CLOSED container with replicas:
      CLOSED, CLOSED, CLOSED, UNHEALTHY (not counted, seen as excess that can be deleted).

      In the current iteration, RM sends a delete command for the unhealthy, so now there's a pending delete. In the next iteration, if the delete is still pending, then RM will see 3 CLOSED replicas - 1 pending delete + 1 UNHEALTHY replica. But UNHEALTHY replicas are ignored, that's effectively 3 CLOSED replicas - 1 pending delete (even though the delete is for the UNHEALTHY). This means the effective count becomes 2, which is seen as under replicated. Of course, this container is not actually under replicated. We need to verify if it's actually a bug - I have not written any tests to reproduce this yet.

      Attachments

        Activity

          People

            Unassigned Unassigned
            siddhant Siddhant Sangwan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: