Currently it is possible that a long lived client can add most or all nodes of a small cluster to its exclude list, and further writes using that client instance will fail. There are two ways this can be improved:
- A timeout to remove nodes from the exclude list after so that they can be retried. For EC, this exists and is configured to 10 minutes by default. Ratis does not currently have this but it should be added. (this task)
- Allow the write to fall back to nodes in the exclude list if that is all that is available. This could be implemented on the server side, or as a retry from the client based on the server's initial response. (extracted to
These issues are especially relevant for S3 gateway, which uses a persistent Ozone client to connect to the cluster while it is up.