Description
A NoHostAvailableException occurs in two cases:
1. where the Client is initialized and a failure occurs on all Host instances configured
2. when the Client attempts to chooseConnection() to send a request and all Host instances configured are marked unavailable.
In the first case, you can get a cause for the failure which is helpful, but the inadequacy is that you only get the failure of the first Host to cause a problem. The second case is a bit worse because there you get no cause in the exception and it's a "fast fail" in that as soon as the request is sent there is no pause to see if the Host comes back online. Moreover, a Host can be marked for failure for the infraction of just a single Connection that may have just encountered a intermittent network issue, thus quite quickly killing the entire ConnectionPool and turning 100s or requests per second into 100s of NoHostAvailableException per second. Note that you can also get an infraction for the pool just being overloaded with requests which may signal that either the pool or server not being sized right for the current workload - in either case, the NoHostAvailableException is a bit of a harsh way to deal with that and in any event doesn't quite give the user clues as to how to deal with it.
All in all, this situation makes NoHostAvailableException hard to debug. This ticket is meant to help smooth some of these problems. Initial thoughts for improvements include better logging, ensuring that NoHostAvailableException is not thrown without a cause, preferring more specific exceptions in the fist place to NoHostAvailableException, getting rid of "fast fails" in favor of longer pauses to see if a host can recover and taking a softer stance on when a Host is actually considered "unavailable".
Expecting to implement this without breaking API changes, though exceptions may shift around a bit, but will try to keep those to a minimum.