[HDDS-8494] Adjust replication queue limits for decommissioning nodes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Implemented
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: Ozone Datanode, SCM
Labels:
- pull-request-available

Target Version/s:

1.4.0

Description

When a node hosting a Ratis container is decommissioned, there are generally 3 sources available for the container replicas. One on the decommissioning host, and then 2 others on somewhat random nodes across the cluster. This allows the decommissioning load and hence speed of decommission to be shared across many more nodes.

For an EC container, the decommissioning host is likely the only source of the replica which needs to be copied and hence the decommission will be slower.

A host which is decommissioning is generally not used for Ratis reads unless there are no other nodes available, but it would still be used for EC reads to avoid online reconstruction. As decommission progresses on the node, and new copies are formed, the read load will decline over time. Furthermore, decommissioning nodes are not used for writes, so they should be under less load than other cluster nodes.

Due to the reduced load on a decommissioning host, it is possible to increase the number of commands queued on a decommissioning host and also increase the size of the executor thread pool to process the commands.

When a datanode switches to a decommissioning state, it will adjust the size of the replication supervisor thread pool higher, and if the node returns to the In Service state, it will return to the lower thread pool limit.

Similarly when scheduling commands, SCM can allocate more commands to the decommissioning host, as it should process them more quickly due to the lower load and increased threadpool.

Attachments

Issue Links

relates to

HDDS-8532 Add config for factor of scaling up replication queue/threads in decommissioning nodes

Resolved

HDDS-10237 Dynamic reconfiguration of replication supervisor thread pool

Open

links to

GitHub Pull Request #4645

Activity

People

Assignee:: Attila Doroszlai

Reporter:: Stephen O'Donnell

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 26/Apr/23 12:44

Updated:: 29/Jan/24 22:40

Resolved:: 04/May/23 14:33