Uploaded image for project: 'ActiveMQ'
  1. ActiveMQ
  2. AMQ-5605

High CPU load when using failover transport in network connector

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.11.0, 5.11.1
    • Fix Version/s: 5.12.0, 5.11.3
    • Component/s: None
    • Labels:
    • Environment:

      Ubuntu 14.04, SuSE Enterprise 11.3

      Description

      I've got a configuration with two master/slave setups consisting of 3 ActiveMQ instances each. They are deployed on three servers, with one ActiveMQ instance from each master/slave setup on every server. They are using the leveldb and zookeeper. Everything works fine.

      Now I've got the strange behaviour that when I add a network connector to each ActiveMQ instance like this:

      networkConnector name="toMasterSlave02" dynamicOnly="true" uri="masterslave:(tcp://host1:61617,tcp://host2:61617,tcp://host3:61617)"
      

      When you now restart the master of the master/slave setup that is targeted by the above network connector the cpu load on the current master goes and stays up at 100%, i.e. it uses one CPU per configured transportConnector.

      Now the explanation, mostly copied from http://activemq.2283324.n4.nabble.com/High-CPU-load-with-network-connector-failover-transport-tp4691798.html

      When one of the brokers is restarted, the other broker uses ~400% CPU. The cause is the FailoverTransport reconnectTask spinning, and nothing is stopping the task.

      Reverting this fix made for AMQ-5315, while it does reintroduce the NullPointerException, does handle failover properly without spinning:
      https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f <https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f>

      The reason it works after reverting that change is the NullPointerException is caught, -> serviceLocalException() -> ServiceSupport.dispose(getControllingService()); with the fix made in AMQ-5315, the dispose() call is never made.

      Sorry, but I've got no clue how to provide a unit test for this. Maybe someone else can help.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              larsn Lars Neumann
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: