Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Implemented
-
None
Description
When a node hosting a Ratis container is decommissioned, there are generally 3 sources available for the container replicas. One on the decommissioning host, and then 2 others on somewhat random nodes across the cluster. This allows the decommissioning load and hence speed of decommission to be shared across many more nodes.
For an EC container, the decommissioning host is likely the only source of the replica which needs to be copied and hence the decommission will be slower.
A host which is decommissioning is generally not used for Ratis reads unless there are no other nodes available, but it would still be used for EC reads to avoid online reconstruction. As decommission progresses on the node, and new copies are formed, the read load will decline over time. Furthermore, decommissioning nodes are not used for writes, so they should be under less load than other cluster nodes.
Due to the reduced load on a decommissioning host, it is possible to increase the number of commands queued on a decommissioning host and also increase the size of the executor thread pool to process the commands.
When a datanode switches to a decommissioning state, it will adjust the size of the replication supervisor thread pool higher, and if the node returns to the In Service state, it will return to the lower thread pool limit.
Similarly when scheduling commands, SCM can allocate more commands to the decommissioning host, as it should process them more quickly due to the lower load and increased threadpool.
Attachments
Issue Links
- relates to
-
HDDS-8532 Add config for factor of scaling up replication queue/threads in decommissioning nodes
- Resolved
-
HDDS-10237 Dynamic reconfiguration of replication supervisor thread pool
- Open
- links to