Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-2823 SCM HA Support
  3. HDDS-4589

Handle potential data loss during ReplicationManager.handleOverReplicatedContainer()

    XMLWordPrintableJSON

Details

    Description

      Problem: 

      ReplicationManager maintains the in-flight replication and deletion in-memory, which is not replicated using Ratis. So, theoretically it’s possible that we might run into issues if we immediately start ReplicationManager after a failover.

      Scenario: There are 6 replicas of the container C1 namely CR1, CR2, CR3, CR4, CR5 and CR6. The container is over replicated, so the current SCM S1 decides to delete the excess replicas. SCM S1 picks CR1, CR2 and CR3 for deletion, this information is updated in the in-flight deletion list and deletion commands are sent to the datanodes. If there is a failover at this point and SCM S2 becomes leader, it doesn’t have the in-flight deletion list from SCM S1 and it finds the container C1 to be over replicated. Theoretically it’s possible that SCM S2 picks CR4, CR5 and CR6 for deletion. If this happens, we will end up in data loss.

      To address this issue we will make the logic to select a replica for deletion deterministic. This will make sure that the new leader after failover will pick the same replica for deletion which was picked by the old leader. 

      Approach: 

      Sort the candidate replicas. Delete excess replicas from small to large. There will not be any DeleteContainerCommand for the largest 3 Replica sent by any SCM.

      Example:

      Assume there are 6 replicas of the container C1, the factor of C1 is 3, the names of the replicas are CR1, CR2, CR3, CR4, CR5 and CR6. There will not be any DeleteContainerCommand for CR4, CR5, CR6 sent by any SCM:

      If SCM sees less than or equal to 3 replicas, it won’t send any DeleteContainerCommand.

      If SCM sees 4 replicas, It deletes the smallest replica, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.

      If SCM sees 5 replicas, It deletes the smallest 2 replicas, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.

      If SCM sees 6 replicas, It deletes the smallest 3 replicas, which means it won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a contradiction.

      P.S.:

      Since this issue exists in master as well, e.g., a quickly restart of SCM, we decide to fix this problem in master. 

       

      Attachments

        Issue Links

          Activity

            People

              glengeng Glen Geng
              glengeng Glen Geng
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: