[HDDS-6975] EC: Define the value of Maintenance Redundancy for EC containers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3.0
Component/s: SCM
Labels:
- pull-request-available

Description

For Ratis, the number of replicas which must be available when a node goes into maintenance is a simple integer defaulting to 2 in hdds.scm.replication.maintenance.replica.minimum.

This means that for a Ratis container, one out of the 3 nodes can be offline without any replication happening. This can be set to 1, letting two go offline or 3 ensuring full redundancy and hence replication when any node is taken offline.

It could be argued that 1 would be a better default here. With the default placement of 2 replicas on one rack and 1 on another rack, that should allow for a full rack to be taken offline without replication.

For EC, its a little more tricky. Aside from Ratis 1 containers, which are rarely used in practice, EC can tolerate 2 offline (for 3-2), 3 (for 6-3) or 4 (for 10-4).

If we use the same default of 2, that means replication will always be required for 3-2 containers. Also the "number of replicas online" doesn't make as much sense for EC, as each replica is not identical.

EC is also slightly more tricky - when any of the data copies are offline, online reconctruction must be used to read the data, causing a performance penalty, but that cannot be avoided.

If we take the Ratis default of 2 - when there are two replicas out of 3 online, then we have a remaining redundancy of 1 - ie we can afford to lose one more copy and still read data.

If we change the Ratis setting to 1, there is a remaining redundancy of 0, because the loss of another replica renders the data unreadable.

For EC, if we default the setting to a "remaining redundancy" of 1, this would mean we can tolerate a loss of 1 more replicas and still read the data.

This would allow for 3-2 to have 1 replica offline, 6-3 could have 2 and 10-4 could have 3 without any replicaion. In all cases the data redundancy is the same as with Ratis having 2 containers offline.

Additionally, its highly likely online recovery will be needed to read the data, eg if 1 container is offline in 10-4 there is a 10 in 14 (5 in 7) chance its a data container, so trying to keep more containers online for larger EC groups is probably not going to help performance much.

In a large cluster, ideally EC containers will be spread across racks such that there is only 1 replia per rack, so taking a full rack offline would only reduce the redundancy by 1 meaning even 3-2 containers could tolerate a rack going into maintenance.

In summary, I believe the simplest solution, is to have an EC setting hdds.scm.replication.maintenance.ec.remaining.redundancy = 1 which we use for maintenance of EC containers and is basically equivalent to the Ratis default of 2. It may make sense to call the new parameter hdds.scm.replication.maintenance.remaining.redundancy and use the same value for both Ratis and EC, deprecating the old value.

Attachments

Issue Links

causes

HDDS-7201 testContainerIsReplicatedWhenAllNodesGotoMaintenance is failing frequently

Resolved

links to

GitHub Pull Request #3723

Activity

People

Assignee:: Stephen O'Donnell

Reporter:: Stephen O'Donnell

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 01/Jul/22 09:27

Updated:: 06/Sep/22 12:35

Resolved:: 31/Aug/22 20:39