Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.5.5
-
None
-
None
Description
StaticHostProvider.updateServerList contains address matching like this:
for (InetSocketAddress addr : shuffledList) { if (addr.getPort() == myServer.getPort() && ((addr.getAddress() != null && myServer.getAddress() != null && addr .getAddress().equals(myServer.getAddress())) || addr .getHostString().equals(myServer.getHostString()))) { myServerInNewConfig = true; break; } }
The addresses in shuffledList are unresolved, while the current server address in myServer is a resolved address (coming from a socket). If the connect string is expressed in terms of IP addresses instead of host names, the two won't match even when they represent the same server.
On the unresolved addresses, getAddress() is null, and getHostString() is something like 1.2.3.4. On the resolved address, getAddress() is not null, and getHostString() is (normally) the canonical host name corresponding to the IP address.
As a result, this method tends to return true (reconfig) when it should not. The calling method, ZooKeeper.updateServerList then closes the connection.
This might be written off as not too serious, except that Curator calls this method when there is a connection state change. (Sometimes many times.) What we observe is that when the client has to reconnect, e.g., if there is a server failure, when it reconnects the socket gets closed right away. It goes into a cycle of death until the session dies and a new one is created. (This doesn't seem like very nice behaviour on Curator's behalf, but that's what's out there.)
As a workaround, we implemented a custom HostProvider to filter out calls to updateServerList which don't actually change the list.
As a permanent fix, instead of passing the current host based on the socket remote address, may need to remember the unresolved address that was used to connect. (Or use the original strings.)
Filed this against 3.5.5. Based on source control, it looks this still in exists on master at time of writing.
Attachments
Issue Links
- is related to
-
CURATOR-570 Excessive calls to ZooKeeper.updateServerList (which can result in session death)
- Open