Seeing the TransactedTopicMasterSlaveTest hang when run on Solaris. After examining jstack output for the hanging test and discussing with Gary Tully, he had the following suggestion:
this looks like an example of http://issues.apache.org/activemq/browse/AMQ-1993 but this time on the client side.
It seems like a write that results in the first reconnect attempt does not terminate but it should be aborting as there is no listening or reading thread. I guess there is no back log because the next write is blocked by the transport reconnect mutex. Some combination of low level tcp retries should be able to be configured at the OS level, and I guess the defaults should be less than 1 hour so I would expect this test to eventually complete, however, the OS level timeouts and retries may be contingent on a backlog reaching a minimum and currently there will be only one outstanding write due the the locking around the reconnect logic. This being the case, some code needs to implement the timeout.
The solution from http://issues.apache.org/activemq/browse/AMQ-1993 may be a good approach here. It will timeout a write call.
However the solution is currently only applicable to server side sockets, it needs to be extended to support a client connection.
I think all that is needed is to move the additional configuration code from:
org.apache.activemq.transport.TransportFactory.serverConfigure(Transport, WireFormat, HashMap)
to compositeConfigure that is called for all transports (both client and server)
Indeed, this fixes the problem. I've attached a patch against trunk which resolves the issue.