[AMQ-6197] Problem using 2 or more NetworkConnectors in a single broker with NIO TransportConnectors - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Incomplete
Affects Version/s: 5.12.1
Fix Version/s: None
Component/s: Broker
Labels:
Environment:

RHEL 6.6, java-openjdk-1.7.0 u95

Description

In order to improve CPU usage in a test setup of a network of brokers consisting of 3+ brokers using the following broker configuration

<broker useJmx="${activemq.expose.jmx}" persistent="false"
    brokerName="${activemq.brokerName}" xmlns="http://activemq.apache.org/schema/core">
    <sslContext>
        <amq:sslContext keyStore="${activemq.broker.keyStore}"
            keyStorePassword="${activemq.broker.keyStorePassword}"
            trustStore="${activemq.broker.trustStore}"
            trustStorePassword="${activemq.broker.trustStorePassword}" />
    </sslContext>
    <systemUsage>
        <systemUsage>
            <memoryUsage>
                <memoryUsage limit="${activemq.memoryUsage}" />
            </memoryUsage>
            <tempUsage>
                <tempUsage limit="${activemq.tempUsage}" />
            </tempUsage>
        </systemUsage>
    </systemUsage>
    <destinationPolicy>
        <policyMap>
            <policyEntries>
                <policyEntry queue=">" enableAudit="false">
                    <networkBridgeFilterFactory>
                        <conditionalNetworkBridgeFilterFactory
                            replayWhenNoConsumers="true" />
                    </networkBridgeFilterFactory>
                </policyEntry>
            </policyEntries>
        </policyMap>
    </destinationPolicy>
    <networkConnectors>
        <networkConnector name="queues"
            uri="static:(${activemq.otherBrokers})"
            networkTTL="2" dynamicOnly="true"
            decreaseNetworkConsumerPriority="true"
            conduitSubscriptions="false">
            <excludedDestinations>
                <topic physicalName=">" />
            </excludedDestinations>
        </networkConnector>
        <networkConnector name="topics"
            uri="static:(${activemq.otherBrokers})"
            networkTTL="1" dynamicOnly="true"
            decreaseNetworkConsumerPriority="true"
            conduitSubscriptions="true">
            <excludedDestinations>
                <queue physicalName=">" />
            </excludedDestinations>
        </networkConnector>
    </networkConnectors>
    <transportConnectors>
        <transportConnector
            uri="${activemq.protocol}${activemq.host}:${activemq.tcp.port}?needClientAuth=true"
            updateClusterClients="true" rebalanceClusterClients="true" />
        <transportConnector
            uri="${activemq.websocket.protocol}${activemq.websocket.host}:${activemq.websocket.port}?needClientAuth=true"
            updateClusterClients="true" rebalanceClusterClients="true" />
    </transportConnectors>
</broker>

with the following placeholder values used:

activemq.tcp.port=9000
activemq.protocol=ssl://
activemq.brokerName=activemq-server1.com
activemq.expose.jmx=true
activemq.otherBrokers=ssl://server2.com:9000,ssl://server3.com:9000
activemq.websocket.port=9001
activemq.websocket.protocol=stomp+ssl://
activemq.websocket.host=server1.com
activemq.memoryUsage=1gb
activemq.tempUsage=1gb

We have altered the activemq.protocol placeholder from originally ssl:// to nio+ssl:// and immediately could observe some CPU improvements as hoped for (note the same works with tcp:// and nio://). However after a new deployment of our ActiveMQ and subsequent restart of it we started to encounter weird behavior that some producers would either get timeouts from their request-reply messages or a "unknown destination exception" once the reply is being sent on a temp-queue, the issue only happened when the producer and consumer were connected to different brokers in the network. After some testing we ultimately found out that after a restart often brokers would not start both network bridges, one for queues and one for topics, but rather only one of them. For example in a 3 broker setup each broker usually had 4 network bridges active, 2 for each other broker. However during some restarts we would see any number of active bridges between 2 and 4, no matter the wait time the 2nd bridge to another broker was never started. The logs also showed no output whatsoever, as long as 1 broker was shutdown the other two would output 'connection refused' once he started they would show either 1 or 2 'successfully reconnected' and start exactly this amount of bridges to him.
As soon as we switched back to ssl:// protocol on the transport connector the issue was gone for good, no matter how many restarts always 4 network bridges would be started in each broker. Switching back to nio:// the problem is back right away.
For now we are checking if it is worth configuring an additional TransportConnector running nio:// just for producers and consumers while the network uses the tcp:// connector. The documentations regarding NetworkConnectors usually all use tcp:// or multicast:// (which is not an option for us) in the TransportConnector the bridges attach to, so we are not entirely sure if nio:// is even supposed to work for this case or if this is indeed a bug somewhere.

Problem using 2 or more NetworkConnectors in a single broker with NIO TransportConnectors

Details

Description

Attachments

Activity

People

Dates