ActiveMQ
  1. ActiveMQ
  2. AMQ-3575

Failover transport race condition causes intermittent incomplete bridge connections

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.5.0
    • Fix Version/s: 5.6.0
    • Component/s: Transport
    • Labels:
      None
    • Environment:

      CentOS 5.5 and Mac OSX10

      Description

      There is a race condition in FailoverTransport.java that sometimes results in preventing network bridge connections from starting. This is a serious issue as it was preventing us from setting up failover connections between brokers. I would have asked it be critical if it weren't for a workaround. The workaround I have found is as follows:

      Turn on activemq thread pooling option to avoid failover bridge connection race condition. Change the following property to in your start script to make it false like so. Somehow this got me around the problem of the wrong thread sometimes winning:
      -Dorg.apache.activemq.UseDedicatedTaskRunner=false

      I've attached a unit test to be dropped in activemq-core/src/test/java/org/apache/activemq/transport/failover. The unit test shows that when a delay is introduced in setting of the TransportListener, the BrokerInfo command required to complete the bridge connection will never be processed. There are two unit tests in this class and both are designed to pass. The test called "testTcpThreadWinsPreventsCompletionOfBridge" passes by asserting that it did not receive the BrokerInfo command. You can see through setting the delay value that you can control whether the Main thread wins (in which case all is well), or the TCP thread wins (in which case the network bridge is hung and fails to start)

      Note, this issue only affects network bridge connections which are setup with failover transport, such as a broker that connects to a Master-Slave pair, e.g. failover://(tcp://master:61616,tcp://slave:61616)?randomize=false

        Issue Links

          Activity

          Hide
          Gary Tully added a comment -

          Can you validate against trunk, with

           Assert.assertTrue("Unexpected state: BrokerInfo command was processed", brokerInfoProcessed);

          the test works on trunk, which if I understand you correctly, validates that this is fixed. correct?

          In general, using static:failover: has proven to be problematic. failover hides transport errors but a network bridge is designed to recover from such errors by recreating the bridge so failover should be configured to not reconnect.
          see: https://issues.apache.org/jira/browse/AMQ-3542

          Show
          Gary Tully added a comment - Can you validate against trunk, with Assert.assertTrue( "Unexpected state: BrokerInfo command was processed" , brokerInfoProcessed); the test works on trunk, which if I understand you correctly, validates that this is fixed. correct? In general, using static:failover: has proven to be problematic. failover hides transport errors but a network bridge is designed to recover from such errors by recreating the bridge so failover should be configured to not reconnect. see: https://issues.apache.org/jira/browse/AMQ-3542
          Hide
          Gary Tully added a comment -

          linking to the issue that I think resolved this.

          Show
          Gary Tully added a comment - linking to the issue that I think resolved this.
          Hide
          Timothy Bish added a comment -

          Testing with the proper assertion and repeated attempt shows that on trunk the issue is fixed.

          Show
          Timothy Bish added a comment - Testing with the proper assertion and repeated attempt shows that on trunk the issue is fixed.

            People

            • Assignee:
              Timothy Bish
              Reporter:
              Aaron Phillips
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development