Robbie, thanks for the quick response.
I can try to test this out with a clustered setup, however we have our f2f meeting next week so if I don't get it done by tomorrow then it will be delayed by a week. Hence one reason for my reluctance to see this go through for 0.14.
While it is nice to have automated tests, through experience I've come to realize that they only provide very little confidence for a feature like failover. All most all of the previous failover issues were identified by manual testing with more real life scenarios or in production environments.
Most issues tend to happen when failover happens in the middle of a client going full steam (with operations like producing/consuming/creating/querying ..etc). It is almost impossible to replicate this kind of scenario with the automated testing. The fact that there was no testing resembling production scenarios on your end makes a me a bit concerned about the changes.
I've tried to semi automate tests like this using the python testkit, but never got around to really get it to a reasonable state as it kept failing behind with changes made on the clustering side and brokertest.py .
Another hidden danger of such a change could be an impact on performance. Something that cannot be verified by merely running automated tests. Again if we have an agreed upon framework where we can benchmark before and after fixes we suspect of being perf sensitive that would have been great. We could also run them on a per release basis to ensure that we have either improved or regressed on the perf front.
Anyways, If you feel confident about these changes and strongly feel they should be included in 0.14 then I'd accept your word on it and will not make a fuss about it. But I have to note that I have very little confidence in the changes or the testing being done here so far. Please note it's not a reflection of the work you've done for this particular patch, rather more on the inadequate testing strategy we have with the client in general.