Uploaded image for project: 'ActiveMQ Classic'
  1. ActiveMQ Classic
  2. AMQ-6197

Problem using 2 or more NetworkConnectors in a single broker with NIO TransportConnectors

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Incomplete
    • 5.12.1
    • None
    • Broker
    • RHEL 6.6, java-openjdk-1.7.0 u95

    Description

      In order to improve CPU usage in a test setup of a network of brokers consisting of 3+ brokers using the following broker configuration

      <broker useJmx="${activemq.expose.jmx}" persistent="false"
          brokerName="${activemq.brokerName}" xmlns="http://activemq.apache.org/schema/core">
          <sslContext>
              <amq:sslContext keyStore="${activemq.broker.keyStore}"
                  keyStorePassword="${activemq.broker.keyStorePassword}"
                  trustStore="${activemq.broker.trustStore}"
                  trustStorePassword="${activemq.broker.trustStorePassword}" />
          </sslContext>
          <systemUsage>
              <systemUsage>
                  <memoryUsage>
                      <memoryUsage limit="${activemq.memoryUsage}" />
                  </memoryUsage>
                  <tempUsage>
                      <tempUsage limit="${activemq.tempUsage}" />
                  </tempUsage>
              </systemUsage>
          </systemUsage>
          <destinationPolicy>
              <policyMap>
                  <policyEntries>
                      <policyEntry queue=">" enableAudit="false">
                          <networkBridgeFilterFactory>
                              <conditionalNetworkBridgeFilterFactory
                                  replayWhenNoConsumers="true" />
                          </networkBridgeFilterFactory>
                      </policyEntry>
                  </policyEntries>
              </policyMap>
          </destinationPolicy>
          <networkConnectors>
              <networkConnector name="queues"
                  uri="static:(${activemq.otherBrokers})"
                  networkTTL="2" dynamicOnly="true"
                  decreaseNetworkConsumerPriority="true"
                  conduitSubscriptions="false">
                  <excludedDestinations>
                      <topic physicalName=">" />
                  </excludedDestinations>
              </networkConnector>
              <networkConnector name="topics"
                  uri="static:(${activemq.otherBrokers})"
                  networkTTL="1" dynamicOnly="true"
                  decreaseNetworkConsumerPriority="true"
                  conduitSubscriptions="true">
                  <excludedDestinations>
                      <queue physicalName=">" />
                  </excludedDestinations>
              </networkConnector>
          </networkConnectors>
          <transportConnectors>
              <transportConnector
                  uri="${activemq.protocol}${activemq.host}:${activemq.tcp.port}?needClientAuth=true"
                  updateClusterClients="true" rebalanceClusterClients="true" />
              <transportConnector
                  uri="${activemq.websocket.protocol}${activemq.websocket.host}:${activemq.websocket.port}?needClientAuth=true"
                  updateClusterClients="true" rebalanceClusterClients="true" />
          </transportConnectors>
      </broker>
      

      with the following placeholder values used:

      activemq.tcp.port=9000
      activemq.protocol=ssl://
      activemq.brokerName=activemq-server1.com
      activemq.expose.jmx=true
      activemq.otherBrokers=ssl://server2.com:9000,ssl://server3.com:9000
      activemq.websocket.port=9001
      activemq.websocket.protocol=stomp+ssl://
      activemq.websocket.host=server1.com
      activemq.memoryUsage=1gb
      activemq.tempUsage=1gb
      

      We have altered the activemq.protocol placeholder from originally ssl:// to nio+ssl:// and immediately could observe some CPU improvements as hoped for (note the same works with tcp:// and nio://). However after a new deployment of our ActiveMQ and subsequent restart of it we started to encounter weird behavior that some producers would either get timeouts from their request-reply messages or a "unknown destination exception" once the reply is being sent on a temp-queue, the issue only happened when the producer and consumer were connected to different brokers in the network. After some testing we ultimately found out that after a restart often brokers would not start both network bridges, one for queues and one for topics, but rather only one of them. For example in a 3 broker setup each broker usually had 4 network bridges active, 2 for each other broker. However during some restarts we would see any number of active bridges between 2 and 4, no matter the wait time the 2nd bridge to another broker was never started. The logs also showed no output whatsoever, as long as 1 broker was shutdown the other two would output 'connection refused' once he started they would show either 1 or 2 'successfully reconnected' and start exactly this amount of bridges to him.
      As soon as we switched back to ssl:// protocol on the transport connector the issue was gone for good, no matter how many restarts always 4 network bridges would be started in each broker. Switching back to nio:// the problem is back right away.
      For now we are checking if it is worth configuring an additional TransportConnector running nio:// just for producers and consumers while the network uses the tcp:// connector. The documentations regarding NetworkConnectors usually all use tcp:// or multicast:// (which is not an option for us) in the TransportConnector the bridges attach to, so we are not entirely sure if nio:// is even supposed to work for this case or if this is indeed a bug somewhere.

      Attachments

        Activity

          People

            Unassigned Unassigned
            daniel.hofer Daniel Hofer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: