Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
If a client has a 2 server list, and is currently connected to the last server in that list, and that server then goes offline, the addrvec_next() call handle_error() will push the client to the start of the list and terminate the connection.
Then, the zoo_cycle_next_server() call in zookeeper_interest will be called in response to the connection failure, and the client will cycle back to the failed server.
In this way, a client who has a list of only 2 servers can get stuck on the one failed server. This would only be an issue in an ensemble larger than 2 of course, because failing 1 out of 2 would lead to quorum loss anyway.
There are other harmonics possible if every other server in the list is failed, but this is simplest to reproduce in a 3 server ensemble where the client only knows about 2 servers, one of which then fails. There are probably some elegant fixes here, but I think the simplest is to add a flag to track whether a server has been accessed before, and if it hasn't, don't call zoo_cycle_next_server() at the top of the zookeeper_interest() function.
Attachments
Attachments
Issue Links
- relates to
-
ZOOKEEPER-2466 Client skips servers when trying to connect
- Patch Available