Final patch attached for integration. Reviewed this heavily with Ning and he approves. The core change is that it provides lazy non-blocking connect behavior from TNonblockinSocket when it interacts with TAsyncClientManager by adding a new state to the method call. Also included:
1. bumped the SELECT_TIME down so we can have finer grained timeouts
2. catch runtime exceptions within the entire manager select thread so that ones generated from a callback don't kill it
3. refactored the async test a bit: Initial attempts at this had transient failures in the JankyRunnable portion of TestTAsyncClientManager where the caller would either never be called back for the connect or we would get a SocketException: Connection reset by peer on finishConnect. When I profiled the test I found that the single TNonblockingServer$SelectThread was busy doing reads and writes all the way up until the timeout. Even though we make these calls in parallel, we dispatch them in serial in the server and that appeared to be the bottleneck, so it was possible that we started a whole bunch of them but didn't finish any of them until roughly the same time. if each took 0.5ms in the worst case though, that could be 500*100*0.5ms = 2.5s. so basically the one second wait was just too agressive for running 500 threads with 100 calls per thread. I increased that and reduced the numThreads to 50 (to fix the connection reset problems that i can't explain) and now no longer see any failures after running the test 100 times on snow leopard or linux.
This test is really dirty, mostly because of copied and pasted code but also because it asserts within callbacks, which throws assertion failures that the selector thread subsequently catches. We should store off the throwables and assert outside of the callbacks. Also it appears that part of why the test is so slow and I had to mess so much with num threads and timeouts is because we do synchronization within the callbacks to wake up the calling thread. This means the selector itself is constantly having to notify a bunch of threads, and I caught that as something taking time in the stack traces (since it blocks all other selector progress). A cleaner, faster test would probably be to have a truly async client as the test driver, enqueing the next iteration into a ConcurrentLinkedQueue on the callback rather than doing something synchronized.
That said, I think it is sufficient for now and definitely covers the new non-blocking connect functionality. I fixed a bunch of stuff about how it logged errors and switched to using latches instead of objects since there was a race condition where you could notify before the caller waits.