ActiveMQ
  1. ActiveMQ
  2. AMQ-3353

Durable subscribers on durable topics don't receive messages after network disconnect

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.3.1, 5.5.0
    • Fix Version/s: 5.6.0
    • Component/s: Broker
    • Labels:
      None
    • Environment:

      Windows & Linux
      JDK 1.6

      Description

      I've set up a durable topic with the default (persistent) delivery mode on one machine that is publishing a simple text message every 5 seconds. I created a durable subscriber that consumes messages published to the above topic on another machine. I am using broker to broker communication between the two machines.
      I start up the two programs on either machine and see the messages coming through to the subscriber. If I then pull the network cable to disconnect the network between the two machines, wait for a minute and then plug it back in, my subscriber doesn't receive the messages any more. I can see from the output that the publisher is still publishing them (Temporary topics, non-durable queues all continue to sync up in our production environment, it is only the durable topics that don't work after network reconnect)

      If I were to tweak a setting on the publisher's broker (that was introduced only in 5.5.0), suppressDuplicateTopicSubscriptions=false, then the topics work correctly after network reconnect. But this may have other unintended consequences and I was hoping to get a better idea of:

      • is this a known issue ? if so, then are there any specific challenges that have caused it not to be fixed?
      • are other people out there using durable topics and subscribers without a failover option that have run into this problem? What have they done to work around?

      Here is how my subscriber and publisher are set up:
      Topic Publisher (Machine 1)
      publisherConnection = connFactory.createConnection();
      publisherConnection.setClientID( "ProducerCliID" );
      publisherConnection.start();

      session = publisherConnection.createSession( true, -1 );
      Destination producerTopic = session.createTopic( TEST_TOPIC_NAME );
      producer = session.createProducer( (Topic)producerTopic );
      ....
      ....
      ....
      // On a timer, keep sending this out every 5 seconds
      String text = "HELLO " + count++;
      TextMessage msg = session.createTextMessage( text );
      System.out.println( "Sending TextMessage = " + msg.getText() );
      producer.send( msg );
      session.commit();

      Subscriber ( Machine 2):
      Connection clientConnection = connFactory.createConnection();
      clientConnection.setClientID("cliID");
      clientConnection.start();
      Session session = clientConnection.createSession( false, Session.AUTO_ACKNOWLEDGE );

      Destination topic = session.createTopic( topicName );
      MessageConsumer subscriber = session.createDurableSubscriber( (Topic)topic, "subName" );

      TestMessageListener msgListener = new TestMessageListener( 1000 );
      subscriber.setMessageListener( msgListener );
      .....
      .....
      // TestMessageListener's onMessage method simply outputs the message:
      public void onMessage(Message message)
      {
      if ( message instanceof TextMessage )

      { System.out.println( "Message received = " + ((TextMessage)message) ); }

      }

      I can provide the jars for you to run the program if need be.

      1. test-results.ods
        12 kB
        Andreas Calvo
      2. TEST-org.apache.activemq.usecases.DurableSubscriberWithNetworkDisconnectTest.xml
        65 kB
        Andreas Calvo
      3. TEST-org.apache.activemq.usecases.DurableSubscriberWithNetworkDisconnectTest.xml
        480 kB
        Andreas Calvo
      4. standalone2.xml
        4 kB
        Andreas Calvo
      5. standalone1.xml
        4 kB
        Andreas Calvo
      6. instructions.txt
        2 kB
        Andreas Calvo
      7. example.tar.gz
        19 kB
        Andreas Calvo
      8. embedded2.xml
        4 kB
        Andreas Calvo
      9. embedded1.xml
        4 kB
        Andreas Calvo
      10. DurableSubscriberWithNetworkDisconnectTest.java
        9 kB
        Andreas Calvo
      11. DurableSubscriberWithNetworkDisconnectTest.java
        9 kB
        Andreas Calvo
      12. DurableSubscriberWithNetworkDisconnectTest.java
        10 kB
        Andreas Calvo

        Activity

        Hide
        Andreas Calvo added a comment -

        same problem here.

        Tried with 5.4.2, 5.5.0, 5.6.0-SNAPSHOT and still having the same issue.

        I've set up an environment with two brokers, and the producer and consumer provided in the example directory using durable topics.

        After being disconnected for more than 1m, brokers are able to reconnect but consumer seems to be in an offline state which don't pending messages.
        However, upon restarting the consumer, it'd get all pending messages.

        Is there some kind of timeout between the consumer and the broker?

        Show
        Andreas Calvo added a comment - same problem here. Tried with 5.4.2, 5.5.0, 5.6.0-SNAPSHOT and still having the same issue. I've set up an environment with two brokers, and the producer and consumer provided in the example directory using durable topics. After being disconnected for more than 1m, brokers are able to reconnect but consumer seems to be in an offline state which don't pending messages. However, upon restarting the consumer, it'd get all pending messages. Is there some kind of timeout between the consumer and the broker?
        Hide
        Joe Shisei Niski added a comment - - edited

        i also have this problem on AMQ 5.3.0 (in production), and in our AMQ 5.4.2 test environment (both on 32-bit RedHat Enterprise Linux, v.5.5).

        We have a store & forward network of brokers, much like the back-office/retail store example in ActiveMQ in Action. i use a duplexed networkConnector on the subscribers, with durable topic subscriptions. Queues and topics are defined as <dynamicallyIncludedDestinations> in the <networkConnector> configuration.

        After a disconnect, i can see the brokers reconnecting in both the publisher's and subscribers' logs. Also, the AMQ web console on the publisher continues to show the durable ssubscriptions as on-line. But topic messages published during/after the outage aren't picked up unless i restart the subscriber broker.

        Furthermore, queue messages do flow successfully between the two brokers after the reconnect.

        i've added "keepAlive=true" and "useExponentialBackoff=false" to the URIs for my openwire and ssl transportConnectors, but neither setting has made any difference in the problem behavior.

        Our current work-around is to monitor the message counts on publisher's and subscriber's topics, send an alert to our sysadmin team when they get out of synch, and manually restart the subscriber.

        It's a relief to know that i'm not the only one...

        Show
        Joe Shisei Niski added a comment - - edited i also have this problem on AMQ 5.3.0 (in production), and in our AMQ 5.4.2 test environment (both on 32-bit RedHat Enterprise Linux, v.5.5). We have a store & forward network of brokers, much like the back-office/retail store example in ActiveMQ in Action. i use a duplexed networkConnector on the subscribers, with durable topic subscriptions. Queues and topics are defined as <dynamicallyIncludedDestinations> in the <networkConnector> configuration. After a disconnect, i can see the brokers reconnecting in both the publisher's and subscribers' logs. Also, the AMQ web console on the publisher continues to show the durable ssubscriptions as on-line. But topic messages published during/after the outage aren't picked up unless i restart the subscriber broker. Furthermore, queue messages do flow successfully between the two brokers after the reconnect. i've added "keepAlive=true" and "useExponentialBackoff=false" to the URIs for my openwire and ssl transportConnectors, but neither setting has made any difference in the problem behavior. Our current work-around is to monitor the message counts on publisher's and subscriber's topics, send an alert to our sysadmin team when they get out of synch, and manually restart the subscriber. It's a relief to know that i'm not the only one...
        Hide
        Andreas Calvo added a comment -

        (initial) JUnit case

        Show
        Andreas Calvo added a comment - (initial) JUnit case
        Hide
        Gary Tully added a comment -

        Andreas, thanks for the unit test. It has 32 tests in it (with all the variations), is that expected? I am giving it a run to see what it produces.

        Show
        Gary Tully added a comment - Andreas, thanks for the unit test. It has 32 tests in it (with all the variations), is that expected? I am giving it a run to see what it produces.
        Hide
        Andreas Calvo added a comment -

        Gary,
        We set up a bunch of options, such as failover over static, dynamic only, maxInactivity on wireFormat, and so on.
        It should work always, but some are failing.

        Moreover, it also performs a close or pause on the socket to simulate the network drop.

        Show
        Andreas Calvo added a comment - Gary, We set up a bunch of options, such as failover over static, dynamic only, maxInactivity on wireFormat, and so on. It should work always, but some are failing. Moreover, it also performs a close or pause on the socket to simulate the network drop.
        Hide
        Andreas Calvo added a comment -

        surefire report

        Show
        Andreas Calvo added a comment - surefire report
        Hide
        Andreas Calvo added a comment -

        we found out that it is able to recover while performing a socket.pause() but not with a socket.close() (which, by the way, is the closest behavior that we expect)

        Show
        Andreas Calvo added a comment - we found out that it is able to recover while performing a socket.pause() but not with a socket.close() (which, by the way, is the closest behavior that we expect)
        Hide
        Gary Tully added a comment -

        the problem is the default value for setSuppressDuplicateTopicSubscriptions.

        In the test, add

        connector.setSuppressDuplicateTopicSubscriptions(false);

        to

         org.apache.activemq.usecases.DurableSubscriberWithNetworkDisconnectTest#bridgeBrokers

        That option was added for cyclic network topologies, where there are more than one route to a broker. But it should ignore durable subscriptions as they are already unique based on their subscription id.

        Show
        Gary Tully added a comment - the problem is the default value for setSuppressDuplicateTopicSubscriptions. In the test, add connector.setSuppressDuplicateTopicSubscriptions( false ); to org.apache.activemq.usecases.DurableSubscriberWithNetworkDisconnectTest#bridgeBrokers That option was added for cyclic network topologies, where there are more than one route to a broker. But it should ignore durable subscriptions as they are already unique based on their subscription id.
        Hide
        Andreas Calvo added a comment -

        We're concerned about the cyclic topology with a network of brokers.
        If a durable topic subscription is requested more than once on the same broker, will ignore it?
        thanks!

        Show
        Andreas Calvo added a comment - We're concerned about the cyclic topology with a network of brokers. If a durable topic subscription is requested more than once on the same broker, will ignore it? thanks!
        Hide
        Gary Tully added a comment -

        yes, durable subs must be unique based on their connection id and subscriber name. A duplicate will be suppressed. Validated with a variation on the duplicate suppression test.
        Am going to ensure that setSuppressDuplicateTopicSubscriptions ignores duplicate subs, such that it does not get in the way of normal durable sub reconnect, where the networked durable sub is offline pending a reconnect.

        Show
        Gary Tully added a comment - yes, durable subs must be unique based on their connection id and subscriber name. A duplicate will be suppressed. Validated with a variation on the duplicate suppression test. Am going to ensure that setSuppressDuplicateTopicSubscriptions ignores duplicate subs, such that it does not get in the way of normal durable sub reconnect, where the networked durable sub is offline pending a reconnect.
        Hide
        Gary Tully added a comment -

        Thinking more on this, the failover: protocol on a client is relevant here in that the networkConsumerId on a consumer advisory will be the same in the failover case, and different in the not failover case. this is distinct from the subscriberName:clientId pair. It may be that the networkConsumerId needs to be composed of the subscriberName:clientId for durable subs to properly implement duplicate subscription suppression.

        Show
        Gary Tully added a comment - Thinking more on this, the failover: protocol on a client is relevant here in that the networkConsumerId on a consumer advisory will be the same in the failover case, and different in the not failover case. this is distinct from the subscriberName:clientId pair. It may be that the networkConsumerId needs to be composed of the subscriberName:clientId for durable subs to properly implement duplicate subscription suppression.
        Hide
        Andreas Calvo added a comment - - edited

        @Gary
        We are still trying to figure out this issue.
        Using the latest SNAPSHOT for 5.6 doesn't solve it.
        Our main concern is that using a NoB with static connections (and creating the responder from the central broker with duplex from the embedded broker) seems to throw an error.
        However, using failover does not throw an error, but does not work either.

        PS: NoB architecture is hub-and-spoke using embedded brokers
        broker 1 (embedded) < - > central broker < - > broker 2 (embedded)

        Show
        Andreas Calvo added a comment - - edited @Gary We are still trying to figure out this issue. Using the latest SNAPSHOT for 5.6 doesn't solve it. Our main concern is that using a NoB with static connections (and creating the responder from the central broker with duplex from the embedded broker) seems to throw an error. However, using failover does not throw an error, but does not work either. PS: NoB architecture is hub-and-spoke using embedded brokers broker 1 (embedded) < - > central broker < - > broker 2 (embedded)
        Hide
        Joe Shisei Niski added a comment -

        Greetings:

        As of Friday, 16 December, i'm no longer working at NWEA.

        If you need assistance with Web-based MAP messaging, please contact Scott Dietrich (scott.dietreich@nwea.org) or Aaron Zoller (aaron.zoller@nwea.org).

        Kind regards,
        Joe

        Show
        Joe Shisei Niski added a comment - Greetings: As of Friday, 16 December, i'm no longer working at NWEA. If you need assistance with Web-based MAP messaging, please contact Scott Dietrich (scott.dietreich@nwea.org) or Aaron Zoller (aaron.zoller@nwea.org). Kind regards, Joe
        Hide
        Joe Shisei Niski added a comment -

        Greetings:

        As of Friday, 16 December, i'm no longer working at NWEA.

        If you need assistance with Web-based MAP messaging, please contact Scott Dietrich (scott.dietreich@nwea.org) or Aaron Zoller (aaron.zoller@nwea.org).

        Kind regards,
        Joe

        Show
        Joe Shisei Niski added a comment - Greetings: As of Friday, 16 December, i'm no longer working at NWEA. If you need assistance with Web-based MAP messaging, please contact Scott Dietrich (scott.dietreich@nwea.org) or Aaron Zoller (aaron.zoller@nwea.org). Kind regards, Joe
        Hide
        Gary Tully added a comment - - edited

        @Andreas, so where do we stand?
        Do you have a variant of the test that shows the problem you are seeing that I can play with? Ie: a variant of the current test (DurableSubscriberWithNetworkDisconnectTest) from trunk that will fail? Please attach it to this jira if you do.

        Show
        Gary Tully added a comment - - edited @Andreas, so where do we stand? Do you have a variant of the test that shows the problem you are seeing that I can play with? Ie: a variant of the current test (DurableSubscriberWithNetworkDisconnectTest) from trunk that will fail? Please attach it to this jira if you do.
        Hide
        Gary Tully added a comment -

        resolving pending additional unit test that shows that there still is a problem.

        Show
        Gary Tully added a comment - resolving pending additional unit test that shows that there still is a problem.
        Hide
        Andreas Calvo added a comment - - edited

        @Gary,
        We're still having the same issue.
        It has become critical since it affects the brokers in a newtork of brokers when one of them is "pulled out" of the network and reconnected back after the inactivity timeout.
        We'll try to update our test case with the latest stable release of 5.6 and report back.

        A small snippet of the output after the client is being reconnected (removed broker pluged into the network):
        jvm 1 | INFO | brokerA Shutting down
        jvm 1 | WARN | Could not start network bridge between: vm://brokerA?async=false&network=true and: ssl://10.4.0.164:61616 due to: java.net.SocketTimeoutException: connect timed out
        jvm 1 | INFO | Establishing network connection from vm://brokerA?async=false&network=true to ssl://10.4.0.164:61616
        jvm 1 | INFO | Network connection between vm://brokerA#42 and ssl://10.4.0.164/10.4.0.164:61616(brokerB) has been established.

        Notice how it is identified as connected it is not show as a durable subscriber in the topic.

        Show
        Andreas Calvo added a comment - - edited @Gary, We're still having the same issue. It has become critical since it affects the brokers in a newtork of brokers when one of them is "pulled out" of the network and reconnected back after the inactivity timeout. We'll try to update our test case with the latest stable release of 5.6 and report back. A small snippet of the output after the client is being reconnected (removed broker pluged into the network): jvm 1 | INFO | brokerA Shutting down jvm 1 | WARN | Could not start network bridge between: vm://brokerA?async=false&network=true and: ssl://10.4.0.164:61616 due to: java.net.SocketTimeoutException: connect timed out jvm 1 | INFO | Establishing network connection from vm://brokerA?async=false&network=true to ssl://10.4.0.164:61616 jvm 1 | INFO | Network connection between vm://brokerA#42 and ssl://10.4.0.164/10.4.0.164:61616(brokerB) has been established. Notice how it is identified as connected it is not show as a durable subscriber in the topic.
        Hide
        Andreas Calvo added a comment -

        Use Case updated to stable ActiveMQ 5.6.
        Notice how the timeout parameter in the failover transport protocol has changed and now is using -1 instead of 0

        Show
        Andreas Calvo added a comment - Use Case updated to stable ActiveMQ 5.6. Notice how the timeout parameter in the failover transport protocol has changed and now is using -1 instead of 0
        Hide
        Timothy Bish added a comment -

        Tried you new test case against the latest trunk SNAPSHOT and it passes. There's been a couple recent fixes since 5.6.0 for issues that could affect durable consumers so you may want to try out a 5.7-SNAPSHOT.

        Show
        Timothy Bish added a comment - Tried you new test case against the latest trunk SNAPSHOT and it passes. There's been a couple recent fixes since 5.6.0 for issues that could affect durable consumers so you may want to try out a 5.7-SNAPSHOT.
        Hide
        Andreas Calvo added a comment -

        New test with 3 combinations for the maxReconnectAttemps (0,1 and -1).
        It fails 13 out of 24.
        Using 5.7-SNAPSHOT.

        Repository Root: https://svn.apache.org/repos/asf
        Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
        Revision: 1346054
        Node Kind: directory
        Schedule: normal
        Last Changed Author: gtully
        Last Changed Rev: 1345202
        Last Changed Date: 2012-06-01 16:32:50 +0200 (Fri, 01 Jun 2012)

        Show
        Andreas Calvo added a comment - New test with 3 combinations for the maxReconnectAttemps (0,1 and -1). It fails 13 out of 24. Using 5.7-SNAPSHOT. Repository Root: https://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 1346054 Node Kind: directory Schedule: normal Last Changed Author: gtully Last Changed Rev: 1345202 Last Changed Date: 2012-06-01 16:32:50 +0200 (Fri, 01 Jun 2012)
        Hide
        Andreas Calvo added a comment - - edited

        To discard a bad implementation of the process, I've reproduced the problem using the examples provided in the stable release.
        I've had to modify the java code for both the ConsumerTool and ProducerTool to add a string property with the sender string.
        I am reusing the producerClientId parameter to set the sender name.
        The file example.tar.gz contains the example directory from ActiveMQ-5.6 stable release with the modified files.

        The instructions.txt contains a detailed description of the process used.

        Show
        Andreas Calvo added a comment - - edited To discard a bad implementation of the process, I've reproduced the problem using the examples provided in the stable release. I've had to modify the java code for both the ConsumerTool and ProducerTool to add a string property with the sender string. I am reusing the producerClientId parameter to set the sender name. The file example.tar.gz contains the example directory from ActiveMQ-5.6 stable release with the modified files. The instructions.txt contains a detailed description of the process used.
        Hide
        Antonio added a comment -

        As Andreas Calvo pointed in post:

        https://issues.apache.org/jira/browse/AMQ-3353?focusedCommentId=13294334&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13294334

        this problem is already present in activemq 5.6, it's not fixed, I don't understand why it's closed, I'm affected too, exactly the same situation Andreas is experiencing.

        Andreas?, is it fixed for you?.

        Show
        Antonio added a comment - As Andreas Calvo pointed in post: https://issues.apache.org/jira/browse/AMQ-3353?focusedCommentId=13294334&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13294334 this problem is already present in activemq 5.6, it's not fixed, I don't understand why it's closed, I'm affected too, exactly the same situation Andreas is experiencing. Andreas?, is it fixed for you?.
        Hide
        Noel Ady added a comment -

        We have been experiencing this same issue on 5.7.0. When AMQ server is rebooted all clients reconnect as expected. However durable subscribers show as active but the consumer is not pulling new messages. You can see on the subscribers page, messages are being en-queued to the durable sub but not being de-queued. It has caused some expensive mistakes recently. Havng read through the above , I am still unclear on what the best workaround is or if a fix is in progress?

        Show
        Noel Ady added a comment - We have been experiencing this same issue on 5.7.0. When AMQ server is rebooted all clients reconnect as expected. However durable subscribers show as active but the consumer is not pulling new messages. You can see on the subscribers page, messages are being en-queued to the durable sub but not being de-queued. It has caused some expensive mistakes recently. Havng read through the above , I am still unclear on what the best workaround is or if a fix is in progress?
        Hide
        Gary Tully added a comment -

        peek at the test case that works on trunk: https://github.com/apache/activemq/blob/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DurableSubscriberWithNetworkDisconnectTest.java

        note how it is configured.
        To demonstrate a bug, please make some changes to that test case or add some additional scenarios and then reopen this issue if it breaks.

        Show
        Gary Tully added a comment - peek at the test case that works on trunk: https://github.com/apache/activemq/blob/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DurableSubscriberWithNetworkDisconnectTest.java note how it is configured. To demonstrate a bug, please make some changes to that test case or add some additional scenarios and then reopen this issue if it breaks.
        Hide
        Andreas Calvo added a comment - - edited

        Gary Tully,
        Thanks for the quick reply.
        However, although when I made the test it failed (and now it doesn't), I don't think it captures truly the situation.
        Using a proxy can throw the Inactivity Monitor (even if it's paused or closed), but it does not capture an abrupt stop of the broker or it's connections.
        While doing more tests in the lab, throwing a socket disconnect error (that means, unplugging the network cable) seems to, as Noel Ady expressed, leave the durable subscribers in a zombie state: they seem to exists and receive packages, but aren't able to dequeue anything.
        This fails as of ActiveMQ 5.7, haven't tested yet on 5.8.

        If there is any way to cause a socket disconnect error programmatically, it may give us a hint.

        I may note that, while using ActiveMQ the results are really bad, using fuse-09-16 seems to give better results.

        Thanks!

        Show
        Andreas Calvo added a comment - - edited Gary Tully , Thanks for the quick reply. However, although when I made the test it failed (and now it doesn't), I don't think it captures truly the situation. Using a proxy can throw the Inactivity Monitor (even if it's paused or closed), but it does not capture an abrupt stop of the broker or it's connections. While doing more tests in the lab, throwing a socket disconnect error (that means, unplugging the network cable) seems to, as Noel Ady expressed, leave the durable subscribers in a zombie state: they seem to exists and receive packages, but aren't able to dequeue anything. This fails as of ActiveMQ 5.7, haven't tested yet on 5.8. If there is any way to cause a socket disconnect error programmatically, it may give us a hint. I may note that, while using ActiveMQ the results are really bad, using fuse-09-16 seems to give better results. Thanks!
        Hide
        Noel Ady added a comment - - edited

        Thanks Gary. I can confirm my results are as Andreas Calvo previously stated. The situation occurs when you pull the network plug (or do a reboot on the hosting operating system). The problem does not occur when you stop and restart the activemq broker.Although as Andreas, I have not tried this against 5.8.0. I noticed there is a fix in 5.8.0 regarding durable subs, but specifically to offline deletions. I intend to run tests against this version before re-opening this bug.

        Show
        Noel Ady added a comment - - edited Thanks Gary. I can confirm my results are as Andreas Calvo previously stated. The situation occurs when you pull the network plug (or do a reboot on the hosting operating system). The problem does not occur when you stop and restart the activemq broker.Although as Andreas, I have not tried this against 5.8.0. I noticed there is a fix in 5.8.0 regarding durable subs, but specifically to offline deletions. I intend to run tests against this version before re-opening this bug.
        Hide
        Gary Tully added a comment -

        There is a test case that simulates a very flaky network by randomly closing the socket locally. It uses a tcpfaulty://.. transport.
        see: org.apache.activemq.usecases.MulticastDiscoveryOnFaultyNetworkTest

        Maybe you can use some variant of that to reproduce in a unit test.
        When the network cable is unpluged, there is a delay in getting socket feedback that depends on the tcp stack settings for timeouts and retries. So from a java app perspective, connections that are half closed can appear valid for some period. Whatever the case, the broker should be able to deal with it though, and once the tcp stack reports the real state of the network, things should come back to normal.

        Show
        Gary Tully added a comment - There is a test case that simulates a very flaky network by randomly closing the socket locally. It uses a tcpfaulty://.. transport. see: org.apache.activemq.usecases.MulticastDiscoveryOnFaultyNetworkTest Maybe you can use some variant of that to reproduce in a unit test. When the network cable is unpluged, there is a delay in getting socket feedback that depends on the tcp stack settings for timeouts and retries. So from a java app perspective, connections that are half closed can appear valid for some period. Whatever the case, the broker should be able to deal with it though, and once the tcp stack reports the real state of the network, things should come back to normal.
        Hide
        Alin added a comment - - edited

        I'm seeing the exact same behavior when using a javax.jms.MessageConsumer.
        Whenever I'm powering off or unplugging the master broker machine,The slave broker will take over the master role but the consumers will not reconnect to the new master broker.
        There are no logged errors or other type of messages on the consumer side. The consumer behaves like it is still connected to the previous master instance and it is waiting for the messages to arrive in the queue.

        I was thinking of implementing a thread on my consumer side which will query the JMS connection from time to time, hopefully this way the fail-over mechanism will kick in. Although I'm not convinced that this is the right approach.

        The broker URL I'm using is:
        failover:(tcp://server1:61616?keepAlive=true&wireFormat.maxInactivityDuration=0,tcp://server2:61616?keepAlive=true&wireFormat.maxInactivityDuration=0)?randomize=false&updateURIsSupported=false&jms.prefetchPolicy.all=1

        The ActiveMQ version I'm using is 5.8.0
        The client side libraries were 5.4.2, 5.7.0 and 5.8.0

        Show
        Alin added a comment - - edited I'm seeing the exact same behavior when using a javax.jms.MessageConsumer. Whenever I'm powering off or unplugging the master broker machine,The slave broker will take over the master role but the consumers will not reconnect to the new master broker. There are no logged errors or other type of messages on the consumer side. The consumer behaves like it is still connected to the previous master instance and it is waiting for the messages to arrive in the queue. I was thinking of implementing a thread on my consumer side which will query the JMS connection from time to time, hopefully this way the fail-over mechanism will kick in. Although I'm not convinced that this is the right approach. The broker URL I'm using is: failover:(tcp://server1:61616?keepAlive=true&wireFormat.maxInactivityDuration=0,tcp://server2:61616?keepAlive=true&wireFormat.maxInactivityDuration=0)?randomize=false&updateURIsSupported=false&jms.prefetchPolicy.all=1 The ActiveMQ version I'm using is 5.8.0 The client side libraries were 5.4.2, 5.7.0 and 5.8.0
        Hide
        Timothy Bish added a comment -
        wireFormat.maxInactivityDuration=0
        

        Turns off the built in keep alive functionality and the default TCP keepalive is usually two hours so you'd need to tune the OS to rely on that. Why not just turn on the standard inactivity monitoring instead of trying to reinvent the wheel?

        Show
        Timothy Bish added a comment - wireFormat.maxInactivityDuration=0 Turns off the built in keep alive functionality and the default TCP keepalive is usually two hours so you'd need to tune the OS to rely on that. Why not just turn on the standard inactivity monitoring instead of trying to reinvent the wheel?

          People

          • Assignee:
            Gary Tully
            Reporter:
            Syed Faraz Ali
          • Votes:
            2 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development