If a request sends no timeAllowed threshold (or if it sends a very generous threshold) then that request can potentially be retried on 'very many' servers in the cloud.
Via the HttpShardHandlerFactory.loadBalancerRequests(MinimumAbsolute|MaximumFraction) options the number of servers tried can be restricted via configuration e.g.
would on a six-replica-and-all-replicas-active collection/shard restrict sending to three replicas i.e. max(2, 0.50 x 6) and if the collection/shard temporarily becomes three-replicas-active-and-three-replicas-recovering-or-down then sending is restricted to two replicas i.e. max(2, 0.50 x 3) temporarily.