Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.1
-
None
-
None
-
zookeeper c-client
Description
Currently, when a C client get disconnected, it retries a couple of hosts (not all) with no delay between attempts and then if it doesn't succeed it sleeps for 1/3 session expiration timeout period before trying again.
In the worst case the disconnect event can occur after 2/3 of session expiration timeout has past, and sleeping for even more 1/3 session timeout will cause a session loss in most of the times.
A better approach is to check all hosts but with random delay between reconnect attempts. Also the delay must be independent of session timeout so if we increase the session timeout we also increase the number of available attempts.
This improvement covers the case when the C client experiences network problems for a short period of time and is not able to reach any zookeeper hosts.
Java client already uses this logic and works very good.