We do have some race conditions there. We don't see them fail in the unit tests, because our #wait are bounded. But from a performance point of view, they do occur. I've reviewed them and fix all the issue I found excepted in the AM (haven't reviewed this one, may be it's fine).
On a perf test, this seems to improve the max latency.