The nio+ssl transports can hang on initial connection and never read from the socket after the SSL handshake for certain conditions. This behavior is most evident when using the auto+nio+ssl transport for a network bridge between 2 brokers, however I also saw the issue for the normal nio+ssl transport when running the NetworkAsyncStartTest and even the amqp+nio+ssl transport.
After debugging I found that the issue is that the onSelect method of the registered callback, which calls the serviceRead() method, is not always getting triggered. I believe that the root of the problem is that even though a selector is registered with a SelectionKey.OP_READ, there is no guarantee that the selected set is correct which is what the SelectorWorker uses to detect if the operation is ready. The SelectionKey documentation specifically states that the ready set is a hint but not a guarantee that the channel is ready. This seems to only effect the SSL transport (not normal NIO), probably because a read selection was already done once to unwrap the SSL transport
More info: https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SelectionKey.html
The fix for this is to trigger the selectRead() after the transport finishes starting up. (needs to be in a new thread specifically for OpenWire to allow the wireformat negotiation to not block on startup). This will work for the SSL transport specifically since we know the transport is read to read from the the channel after starting up. We know this because the SSL handshake already took place which means we've already read from the channel.
- relates to
AMQ-7115 Deadlock between MQTTInactivityMonitor and BrokerService Threads