Currently in ZKUtils, ZKLocks and ZKJobsConcurrency services, we don't properly teardown the zookeeper connections when the callback was not received from zookeeper to change the connection state.
We can get into this situation if the ZK session for example was closed by ZK before any callback was received to update the connection state. This can cause the oozie server in a HA mode to not terminate with one or more sockets in close_wait state.
Here is an instance of this issue
From the network connections, we have one connection still on close_wait with indefinite wait.
tcp6 143 0 x.x.x.1:46710 x.x.x.2:2181 CLOSE_WAIT 4688/java off (0.00/0/0)
From the zookeeper logs,
016-08-18 20:45:29,921 - INFO NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868 - Client attempting to establish new session at /x.x.x.1:46710 2016-08-18 20:45:29,926 - INFO CommitProcessor:1:ZooKeeperServer@617 - Established session 0x1569f576843000e with negotiated timeout 40000 for client /x.x.x.1:46710
2016-08-18 20:46:34,008 - INFO CommitProcessor:1:NIOServerCnxn@1007 - Closed socket connection for client /x.x.x.1:46710 which had sessionid 0x1569f576843000e
The fix is to not check for the connectionstate during service destroy and teardown the zk connections.