I need a little help getting to the bottom of this (I might be misreading Hudson's logs).
The code in question is, I think, 'ok' (although a bit dodgy). The idea is to test the ability of a client - that is waiting because the max cnxns limit has been reached - to reconnect once a slot becomes free on the server. So ideally for this test close(1) should happen after createclient(2) has connected. As you say, this is a false assumption as the close might happen before the createClient(2) succeeds so there is no contention, but this should only be giving false positives - the second assert should eventually succeed. What I need to do to improve this is to replace createClient with a call that blocks until we at least know the connection attempt has been made, if that's possible.
However the most recent Hudson failures don't seem to be related. From build 375:
[exec] Zookeeper_simpleSystem::testAsyncWatcherAutoReset : assertion
[exec] Zookeeper_watchers::testDefaultSessionWatcher1 : OK
[exec] Zookeeper_watchers::testDefaultSessionWatcher2 : OK
[exec] Zookeeper_watchers::testObjectSessionWatcher1 : OK
[exec] Zookeeper_watchers::testObjectSessionWatcher2 : OK
[exec] Zookeeper_watchers::testNodeWatcher1 : OK
[exec] Zookeeper_watchers::testChildWatcher1 : OK
[exec] Zookeeper_watchers::testChildWatcher2 : OK
[exec] /home/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestClient.cc:289: Assertion: equality assertion failed [Expected: -101, Actual : -4]
[exec] Failures !!!
[exec] Run: 32 Failure total: 1 Failures: 1 Errors: 0
[exec] make: *** [run-check] Error 1
and the same from 376 (yesterday's build). These are failing in TestClient (specifically testAsyncWatcherAutoReset). The error here is that a stat completion callback is getting called with ZCONNECTIONLOSS, but is expecting to see ZNONODE, and the assert is failing.
This test runs fine for me locally, so is the problem a heavily loaded Hudson, causing the connection loss?
Similarly the failed build you point to, 371, fails TestClientRetry with a broken pipe error which to my novice eye sounds a bit like something falling over under load.
It looks to me right now like the TestClientRetry code needs improving, but is benign as it should only cause false positives, and we need to understand the reasons why TestClient is failing. Does that sound right?