On gw56.iu.xsede.org, where the develop branch of airavata is deployed, there are currently over 4,000 Zookeeper connections in TIME_WAIT state.
This number is fairly constant during the time I've been watching it. On gw77.iu.xsede.org where the master branch is deployed, there are none of these TIME_WAIT connections.
I looked into this a bit and wrote the following on HipChat
[5:41 PM] Marcus Christie: From what I've been reading, I think the TIME_WAIT problem must be coming from Zookeeper clients connecting and then closing over and over again.
[5:42 PM] Marcus Christie: A TCP connection will stay in TIME_WAIT for about 4 minutes after it is closed http://stackoverflow.com/questions/10726049/what-is-the-reason-for-time-wait-connection-increasing-i...
[5:44 PM] Marcus Christie: There are consistently about 4,000 connections in TIME_WAIT. If they hang around for 4 minutes (240 seconds), then that means there must be 16.667 new connections being created (and eventually closed) each second.
- smarru already tried purging old logs, see the Zookeeper docs
- Zookeeper has some administrative commands that are useful for finding out it's self-reported statistics about number of connections, etc.
- to run these do
- useful links on TIME_WAIT