ActiveMQ .Net
  1. ActiveMQ .Net
  2. AMQNET-294

durable subscription message loss when master broker fails to slave

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: 1.4.1
    • Fix Version/s: None
    • Component/s: ActiveMQ
    • Labels:
      None
    • Environment:

      Windows 7 (client), Windows Server 2008 64-bit (server brokers run on), Sql Server 2008 (database)

    • Regression:
      Regression, Unit Test Broken

      Description

      We are seeing message loss on a durable subscription when using NMS ActiveMQ v1.4.1 and ActiveMQ v5.4.1.

      Please run the included NUnit test and watch the console output. When it says "Failover the broker now!" do as it says. About 75% of the time less than half of the expected 250 messages come through.

      Using version 1.1 of NMS the majority of the time the test passes. I have seen it fail only a few times with this earlier version, and when it does there are only a couple messages that don't come through.

      In the zip file will be the unit test, and a config directory containing the master and slave activemq configurations. We are using JDBC master/slave.

      1. TimsTestRevisedSlightly.zip
        8 kB
        Mark Gellings
      2. NMSLogLocalFreshCannedActiveMQ542Broker.txt
        492 kB
        Mark Gellings
      3. NMSLog.txt
        491 kB
        Mark Gellings
      4. DurableSubscriberFailoverTest.java
        11 kB
        Timothy Bish
      5. DurableConsumerTest.cs
        10 kB
        Timothy Bish
      6. Apache.NMS.Test (2).zip
        3.94 MB
        Mark Gellings
      7. Apache.NMS.Test.zip
        827 kB
        Mark Gellings

        Issue Links

          Activity

          Hide
          Jan Vogelgesang added a comment -

          It looks that it can be similar problem to AMQNET-293.

          If you your connection to broker is lost when you are in OnMessage method then your consumer may not be recovered. I suspect that if you remove Sleep(200) from OnMessage it will behave better... (less time you spend in OnMessage then less chance that connection break will happen there)

          Please build NMS from trunk but use the MessageConsumer.cs file I put as the solution to the AMQNET-293 and let me know if it helps.

          Show
          Jan Vogelgesang added a comment - It looks that it can be similar problem to AMQNET-293 . If you your connection to broker is lost when you are in OnMessage method then your consumer may not be recovered. I suspect that if you remove Sleep(200) from OnMessage it will behave better... (less time you spend in OnMessage then less chance that connection break will happen there) Please build NMS from trunk but use the MessageConsumer.cs file I put as the solution to the AMQNET-293 and let me know if it helps.
          Hide
          Timothy Bish added a comment -

          I haven't been able to look into this yet. What I would do to start would be to run a similar test in java with the trace option set on the uri and compare it to what the .NET client is sending to see if we are not sending something in an OpenWire message that we should be.

          Show
          Timothy Bish added a comment - I haven't been able to look into this yet. What I would do to start would be to run a similar test in java with the trace option set on the uri and compare it to what the .NET client is sending to see if we are not sending something in an OpenWire message that we should be.
          Hide
          Mark Gellings added a comment -

          I have tested against tagged 1.1, 1.2, 1.3, and 1.4.1 versions. Only 1.4.1 do I see messages not come across.

          With 1.1, 1.2, and 1.3 a lot of duplicate messages (sometimes triplets) come across with CLIENT and INDIVIDUAL acknowledgement modes. Duplicate message count is generally equal to the prefetch count. This seems strange when previously ack'd messages are replayed. Should this be working like this? Perhaps I misunderstand message replay so forgive me if so.

          Testing 1.3 with TRANSACTIONAL acknowledgement mode and only recieve one or no duplicates. They come across correctly as redelivered.

          I have included an upgraded zip file which includes the tagged versions as well as an extra unit test. This test uses the default durable topic prefetch (100) as well as tests with 1,000 messages with a smaller delay between processing.

          Show
          Mark Gellings added a comment - I have tested against tagged 1.1, 1.2, 1.3, and 1.4.1 versions. Only 1.4.1 do I see messages not come across. With 1.1, 1.2, and 1.3 a lot of duplicate messages (sometimes triplets) come across with CLIENT and INDIVIDUAL acknowledgement modes. Duplicate message count is generally equal to the prefetch count. This seems strange when previously ack'd messages are replayed. Should this be working like this? Perhaps I misunderstand message replay so forgive me if so. Testing 1.3 with TRANSACTIONAL acknowledgement mode and only recieve one or no duplicates. They come across correctly as redelivered. I have included an upgraded zip file which includes the tagged versions as well as an extra unit test. This test uses the default durable topic prefetch (100) as well as tests with 1,000 messages with a smaller delay between processing.
          Hide
          Mark Gellings added a comment -

          Updated unit tests.

          Show
          Mark Gellings added a comment - Updated unit tests.
          Hide
          Mark Gellings added a comment -

          Tim I don't develop in Java. What IDE do you recommend? NetBeans? Eclipse? If I have time I can put together a Java client, but right now I'm testing v1.3 to see if we can get by with that version for out ActiveMQ 5.4.1 migration.

          Show
          Mark Gellings added a comment - Tim I don't develop in Java. What IDE do you recommend? NetBeans? Eclipse? If I have time I can put together a Java client, but right now I'm testing v1.3 to see if we can get by with that version for out ActiveMQ 5.4.1 migration.
          Hide
          Timothy Bish added a comment -

          Just an observation, your individual ack setup in the test case isn't actually acking the messages, so they are going to get redelivered.

          Show
          Timothy Bish added a comment - Just an observation, your individual ack setup in the test case isn't actually acking the messages, so they are going to get redelivered.
          Hide
          Mark Gellings added a comment -

          Yes good catch. I did discover that as well after uploading the test. Unfortunately acking the messages still doesn't resolve the problem for individual acknowledgement.

          I ended up accommodating for individual acknowledgement mode by adding this to the OnMessage method:

          if (_session.AcknowledgementMode == AcknowledgementMode.ClientAcknowledge ||
          _session.AcknowledgementMode == AcknowledgementMode.IndividualAcknowledge )

          { message.Acknowledge(); }
          Show
          Mark Gellings added a comment - Yes good catch. I did discover that as well after uploading the test. Unfortunately acking the messages still doesn't resolve the problem for individual acknowledgement. I ended up accommodating for individual acknowledgement mode by adding this to the OnMessage method: if (_session.AcknowledgementMode == AcknowledgementMode.ClientAcknowledge || _session.AcknowledgementMode == AcknowledgementMode.IndividualAcknowledge ) { message.Acknowledge(); }
          Hide
          Timothy Bish added a comment -

          Cleaned up the supplied unit test and then created a Java version of it.

          Show
          Timothy Bish added a comment - Cleaned up the supplied unit test and then created a Java version of it.
          Hide
          Mark Gellings added a comment - - edited

          Tim I revised your unit test slightly to get it to work for me (e.g. changing Broker Uri) and appears to still be a deadlock. Gets to 50 messages and then processes no more.

          I've attached my changes to your test.

          We had thought maybe this was because we were running the clients on Windows 7, but when running on a Server 2003 machine it still only processed up to 50 messages.

          Our brokers run on Windows Server 2008 64-bit. When you run your tests are you running the brokers locally on your machine or across a network? Also are you using the official ActiveMQ 5.4.2 like us?

          Maybe you could attach your broker config(s) and I can test on our side.

          Let me know.

          Show
          Mark Gellings added a comment - - edited Tim I revised your unit test slightly to get it to work for me (e.g. changing Broker Uri) and appears to still be a deadlock. Gets to 50 messages and then processes no more. I've attached my changes to your test. We had thought maybe this was because we were running the clients on Windows 7, but when running on a Server 2003 machine it still only processed up to 50 messages. Our brokers run on Windows Server 2008 64-bit. When you run your tests are you running the brokers locally on your machine or across a network? Also are you using the official ActiveMQ 5.4.2 like us? Maybe you could attach your broker config(s) and I can test on our side. Let me know.
          Hide
          Timothy Bish added a comment -

          I forgot to ask, why are you turning off inactivity monitoring on the one broker URI?

          Show
          Timothy Bish added a comment - I forgot to ask, why are you turning off inactivity monitoring on the one broker URI?
          Hide
          Mark Gellings added a comment -
          Show
          Mark Gellings added a comment - I experienced problems with it while testing. See http://activemq.2283324.n4.nabble.com/NMS-gt-InactivityException-td3049191.html#a3049659 .
          Hide
          Timothy Bish added a comment -

          Given the way the test case works if that's the broker that you happen to be connected to and you kill the broker the client might not notice so it could appear to be hung. The TCP keep alive is around two hours if enabled so if you never send a message the TcpTransport would just sit there waiting on receive. The three stack trace images that you attached to the other issue looks that way, its sitting waiting to receive something, not having info on the other threads it just looked to me like it was waiting for more messages.

          Show
          Timothy Bish added a comment - Given the way the test case works if that's the broker that you happen to be connected to and you kill the broker the client might not notice so it could appear to be hung. The TCP keep alive is around two hours if enabled so if you never send a message the TcpTransport would just sit there waiting on receive. The three stack trace images that you attached to the other issue looks that way, its sitting waiting to receive something, not having info on the other threads it just looked to me like it was waiting for more messages.
          Hide
          Mark Gellings added a comment -

          I was just using your unit test this time with triggered the transport to close. I wasn't actually failing the broker over.

          Also I wasn't killing the broker, just shutting down the master and allowing the slave to take over. Not sure if that would make a difference or not.

          I can try enabling the inactivity monitor and lower the keep alive period. How exactly would I do that?

          Also, was this logic changed from NMS v1.3 because that version works fine, no deadlocks or hanging when the broker fails over to the slave.

          Also, were you running your broker locally or across a network? Can you send me your broker config(s) and I can try testing under your setup?

          Show
          Mark Gellings added a comment - I was just using your unit test this time with triggered the transport to close. I wasn't actually failing the broker over. Also I wasn't killing the broker, just shutting down the master and allowing the slave to take over. Not sure if that would make a difference or not. I can try enabling the inactivity monitor and lower the keep alive period. How exactly would I do that? Also, was this logic changed from NMS v1.3 because that version works fine, no deadlocks or hanging when the broker fails over to the slave. Also, were you running your broker locally or across a network? Can you send me your broker config(s) and I can try testing under your setup?
          Hide
          Timothy Bish added a comment -

          All I did was setup a pure master/slave as described here: http://activemq.apache.org/pure-master-slave.html

          All tests are on the local machine.

          If inactivity is on then you shouldn't need to change anything, the standard period is 30secs. If you want to play with settings, the reference is here: http://activemq.apache.org/nms/configuring.html

          At this point I'm kind of confused as to what issue we are working on, Lockup is different that what the title of this issue is so if that's what's happening then maybe we need a new issue with the simplest possible test case that reproduces the issue (preferably with no master / slave stuff to confuse things) From the client side master / slave is transparent, client is just connecting and reconnecting to brokers.

          Code changed between 1.3 and 1.4 yes, bugs could be anywhere, that's why we need test cases. With version 1.4 you should running against a 5.4+ broker, otherwise client reconnect will not work right for consumers because of a fix for AMQ-2579.

          Show
          Timothy Bish added a comment - All I did was setup a pure master/slave as described here: http://activemq.apache.org/pure-master-slave.html All tests are on the local machine. If inactivity is on then you shouldn't need to change anything, the standard period is 30secs. If you want to play with settings, the reference is here: http://activemq.apache.org/nms/configuring.html At this point I'm kind of confused as to what issue we are working on, Lockup is different that what the title of this issue is so if that's what's happening then maybe we need a new issue with the simplest possible test case that reproduces the issue (preferably with no master / slave stuff to confuse things) From the client side master / slave is transparent, client is just connecting and reconnecting to brokers. Code changed between 1.3 and 1.4 yes, bugs could be anywhere, that's why we need test cases. With version 1.4 you should running against a 5.4+ broker, otherwise client reconnect will not work right for consumers because of a fix for AMQ-2579 .
          Hide
          Mark Gellings added a comment -

          As far as your confusion...the basic issue is with a durable subscription, upon failing a master broker to the slave, message consumption stops. Only until I restart the message consumer do the remaining messages come through. When the issue was originally created I did not restart the consumer after the broker fail over (as shouldn't need to), so I assumed message loss occurred. Whether the inactivity monitor is enabled or not, the messages do not come through when a broker fails over until I restart the consumer.

          My latest testing is with ActiveMQ 5.4.2 and ActiveMQ NMS trunk. I am using JDBC Master/Slave, the brokers run on Windows Server 2008 and consumer on Windows 7.

          The way I can replicate this issue is when physically failing over the broker or by using your test case.

          The only differences when I run your test case is the broker is not running locally and I am using JDBC master/slave.

          We can create a new issue if you would like. For now I will run a broker locally and run your test to see if it passes.

          One additional detail is I do see the following message in the broker log file after I fail over the broker if that means anything to you.

          2010-12-15 11:26:25,147 | INFO | Transport failed: java.io.IOException: Wire format negotiation timeout: peer did not send his wire format. | org.apache.activemq.broker.TransportConnection.Transport | Async Exception Handler

          Show
          Mark Gellings added a comment - As far as your confusion...the basic issue is with a durable subscription, upon failing a master broker to the slave, message consumption stops. Only until I restart the message consumer do the remaining messages come through. When the issue was originally created I did not restart the consumer after the broker fail over (as shouldn't need to), so I assumed message loss occurred. Whether the inactivity monitor is enabled or not, the messages do not come through when a broker fails over until I restart the consumer. My latest testing is with ActiveMQ 5.4.2 and ActiveMQ NMS trunk. I am using JDBC Master/Slave, the brokers run on Windows Server 2008 and consumer on Windows 7. The way I can replicate this issue is when physically failing over the broker or by using your test case. The only differences when I run your test case is the broker is not running locally and I am using JDBC master/slave. We can create a new issue if you would like. For now I will run a broker locally and run your test to see if it passes. One additional detail is I do see the following message in the broker log file after I fail over the broker if that means anything to you. 2010-12-15 11:26:25,147 | INFO | Transport failed: java.io.IOException: Wire format negotiation timeout: peer did not send his wire format. | org.apache.activemq.broker.TransportConnection.Transport | Async Exception Handler
          Hide
          Mark Gellings added a comment -

          Tim I've attached a log file with NMS tracing enabled...You can see it gets to message 50, then no more.

          2010-12-15 12:10:38,054 [16] [DEBUG] Apache.NMS.Tracer - Consumer: At message 50
          2010-12-15 12:10:38,054 [16] [DEBUG] Apache.NMS.Tracer - *** Failoving broker connection now! ***

          Maybe you could compare my log with a log of your own to see if there are any differences.

          Show
          Mark Gellings added a comment - Tim I've attached a log file with NMS tracing enabled...You can see it gets to message 50, then no more. 2010-12-15 12:10:38,054 [16] [DEBUG] Apache.NMS.Tracer - Consumer: At message 50 2010-12-15 12:10:38,054 [16] [DEBUG] Apache.NMS.Tracer - *** Failoving broker connection now! *** Maybe you could compare my log with a log of your own to see if there are any differences.
          Hide
          Mark Gellings added a comment -

          I see this in the ActiveMQ broker log.

          2010-12-15 12:10:38,322 | INFO | Transport failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///10.0.2.106:57473
          2010-12-15 12:10:59,153 | WARN | Async error occurred: javax.jms.JMSException: Unmatched acknowledge: MessageAck

          {commandId = 53, responseRequired = false, ackType = 2, consumerId = ID:QG076619-57468-634280118318596428-0:2:1:1, firstMessageId = ID:QG076619-57468-634280118318596428-0:1:1:1:50, lastMessageId = ID:QG076619-57468-634280118318596428-0:1:1:1:50, destination = topic://DurableConsumerTest-TopicId, transactionId = null, messageCount = 1}

          ; Could not find Message-ID ID:QG076619-57468-634280118318596428-0:1:1:1:50 in dispatched-list (start of ack) | org.apache.activemq.broker.TransportConnection.Service | ActiveMQ Transport: tcp:///10.0.2.106:57480
          javax.jms.JMSException: Unmatched acknowledge: MessageAck

          {commandId = 53, responseRequired = false, ackType = 2, consumerId = ID:QG076619-57468-634280118318596428-0:2:1:1, firstMessageId = ID:QG076619-57468-634280118318596428-0:1:1:1:50, lastMessageId = ID:QG076619-57468-634280118318596428-0:1:1:1:50, destination = topic://DurableConsumerTest-TopicId, transactionId = null, messageCount = 1}

          ; Could not find Message-ID ID:QG076619-57468-634280118318596428-0:1:1:1:50 in dispatched-list (start of ack)
          at org.apache.activemq.broker.region.PrefetchSubscription.assertAckMatchesDispatched(PrefetchSubscription.java:436)
          at org.apache.activemq.broker.region.PrefetchSubscription.acknowledge(PrefetchSubscription.java:207)
          at org.apache.activemq.broker.region.AbstractRegion.acknowledge(AbstractRegion.java:383)
          at org.apache.activemq.broker.region.RegionBroker.acknowledge(RegionBroker.java:544)
          at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java:77)
          at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java:77)
          at org.apache.activemq.broker.TransactionBroker.acknowledge(TransactionBroker.java:197)
          at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java:77)
          at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java:77)
          at org.apache.activemq.broker.MutableBrokerFilter.acknowledge(MutableBrokerFilter.java:87)
          at org.apache.activemq.broker.TransportConnection.processMessageAck(TransportConnection.java:469)
          at org.apache.activemq.command.MessageAck.visit(MessageAck.java:214)
          at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:311)
          at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:185)
          at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:69)
          at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
          at org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityMonitor.java:228)
          at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
          at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:220)
          at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202)
          at java.lang.Thread.run(Unknown Source)

          Show
          Mark Gellings added a comment - I see this in the ActiveMQ broker log. 2010-12-15 12:10:38,322 | INFO | Transport failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///10.0.2.106:57473 2010-12-15 12:10:59,153 | WARN | Async error occurred: javax.jms.JMSException: Unmatched acknowledge: MessageAck {commandId = 53, responseRequired = false, ackType = 2, consumerId = ID:QG076619-57468-634280118318596428-0:2:1:1, firstMessageId = ID:QG076619-57468-634280118318596428-0:1:1:1:50, lastMessageId = ID:QG076619-57468-634280118318596428-0:1:1:1:50, destination = topic://DurableConsumerTest-TopicId, transactionId = null, messageCount = 1} ; Could not find Message-ID ID:QG076619-57468-634280118318596428-0:1:1:1:50 in dispatched-list (start of ack) | org.apache.activemq.broker.TransportConnection.Service | ActiveMQ Transport: tcp:///10.0.2.106:57480 javax.jms.JMSException: Unmatched acknowledge: MessageAck {commandId = 53, responseRequired = false, ackType = 2, consumerId = ID:QG076619-57468-634280118318596428-0:2:1:1, firstMessageId = ID:QG076619-57468-634280118318596428-0:1:1:1:50, lastMessageId = ID:QG076619-57468-634280118318596428-0:1:1:1:50, destination = topic://DurableConsumerTest-TopicId, transactionId = null, messageCount = 1} ; Could not find Message-ID ID:QG076619-57468-634280118318596428-0:1:1:1:50 in dispatched-list (start of ack) at org.apache.activemq.broker.region.PrefetchSubscription.assertAckMatchesDispatched(PrefetchSubscription.java:436) at org.apache.activemq.broker.region.PrefetchSubscription.acknowledge(PrefetchSubscription.java:207) at org.apache.activemq.broker.region.AbstractRegion.acknowledge(AbstractRegion.java:383) at org.apache.activemq.broker.region.RegionBroker.acknowledge(RegionBroker.java:544) at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java:77) at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java:77) at org.apache.activemq.broker.TransactionBroker.acknowledge(TransactionBroker.java:197) at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java:77) at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java:77) at org.apache.activemq.broker.MutableBrokerFilter.acknowledge(MutableBrokerFilter.java:87) at org.apache.activemq.broker.TransportConnection.processMessageAck(TransportConnection.java:469) at org.apache.activemq.command.MessageAck.visit(MessageAck.java:214) at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:311) at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:185) at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:69) at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113) at org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityMonitor.java:228) at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83) at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:220) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202) at java.lang.Thread.run(Unknown Source)
          Hide
          Timothy Bish added a comment -

          What happens if you run the Java test case I provided using your same configuration? You will probably have to make some changes to address the URI and user / pass for your broker setup.

          Show
          Timothy Bish added a comment - What happens if you run the Java test case I provided using your same configuration? You will probably have to make some changes to address the URI and user / pass for your broker setup.
          Hide
          Mark Gellings added a comment -

          I will try running the java test.

          FYI, I did just run your NMS test case against a fresh ActiveMQ 5.4.2 broker with activemq.bat--as it comes configured after extracting the zip. I ran it locally (same box as consumer) and your NMS test case still fails for me. Processes 50 messages, then stops.

          I will attach the log file.

          What version of the broker are you using when running the NMS test case and are you using Windows 7?

          Show
          Mark Gellings added a comment - I will try running the java test. FYI, I did just run your NMS test case against a fresh ActiveMQ 5.4.2 broker with activemq.bat--as it comes configured after extracting the zip. I ran it locally (same box as consumer) and your NMS test case still fails for me. Processes 50 messages, then stops. I will attach the log file. What version of the broker are you using when running the NMS test case and are you using Windows 7?
          Hide
          Mark Gellings added a comment -

          This statement in the log may be of interest.

          2010-12-15 13:17:08,032 [14] [DEBUG] Apache.NMS.Tracer - Performing Async Clear of In Progress Messages on Consumer: ID:QG076619-59054-634280158265700739-0:2:1:1

          Show
          Mark Gellings added a comment - This statement in the log may be of interest. 2010-12-15 13:17:08,032 [14] [DEBUG] Apache.NMS.Tracer - Performing Async Clear of In Progress Messages on Consumer: ID:QG076619-59054-634280158265700739-0:2:1:1
          Hide
          Mark Gellings added a comment -

          The java test passes.

          Durable Consumer Failover Test Started:
          Dec 15, 2010 1:56:57 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect
          INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true
          Dec 15, 2010 1:56:59 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect
          INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true
          Produced message 10
          Produced message 20
          Produced message 30
          Produced message 40
          Produced message 50
          Produced message 60
          Produced message 70
          Produced message 80
          Produced message 90
          Produced message 100
          Dec 15, 2010 1:56:59 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect
          INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true
          Consumer: At message 10
          Consumer: At message 20
          Consumer: At message 30
          Consumer: At message 40

              • Failoving broker connection now! ***
                Consumer: At message 50
                Consumer: At message 60
                Consumer: At message 70
                Consumer: At message 80
                Consumer: At message 90
                Consumer: At message 100
                Recieved all messages!
                Dec 15, 2010 1:57:21 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect
                INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true
                Durable Consumer Failover Test Complete:
                BUILD SUCCESSFUL (total time: 30 seconds)
          Show
          Mark Gellings added a comment - The java test passes. Durable Consumer Failover Test Started: Dec 15, 2010 1:56:57 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true Dec 15, 2010 1:56:59 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true Produced message 10 Produced message 20 Produced message 30 Produced message 40 Produced message 50 Produced message 60 Produced message 70 Produced message 80 Produced message 90 Produced message 100 Dec 15, 2010 1:56:59 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true Consumer: At message 10 Consumer: At message 20 Consumer: At message 30 Consumer: At message 40 Failoving broker connection now! *** Consumer: At message 50 Consumer: At message 60 Consumer: At message 70 Consumer: At message 80 Consumer: At message 90 Consumer: At message 100 Recieved all messages! Dec 15, 2010 1:57:21 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true Durable Consumer Failover Test Complete: BUILD SUCCESSFUL (total time: 30 seconds)
          Hide
          Timothy Bish added a comment -

          Did you actually fail the broker, I have the auto fail part commented out in the Java sample so you need to do it manually. You can comment it back in and remove the sleep from onMessage or just kill the broker when you see the "Failoving broker connection now!" trace, sorry if that's misleading, it was late.

          If you did can you please be specific about what you did, what version of the broker, what configuration etc. I'm working more than one issue so I need some help so when I get back on this I know where to start.

          On my end I generally always use the latest release or a SNAPSHOT build after latest if I need a specific fix. I've used canned config when possible as that allows us to add tests to the NUnit tests that we otherwise can't since we can't depend on specific broker configs for those tests.

          Show
          Timothy Bish added a comment - Did you actually fail the broker, I have the auto fail part commented out in the Java sample so you need to do it manually. You can comment it back in and remove the sleep from onMessage or just kill the broker when you see the "Failoving broker connection now!" trace, sorry if that's misleading, it was late. If you did can you please be specific about what you did, what version of the broker, what configuration etc. I'm working more than one issue so I need some help so when I get back on this I know where to start. On my end I generally always use the latest release or a SNAPSHOT build after latest if I need a specific fix. I've used canned config when possible as that allows us to add tests to the NUnit tests that we otherwise can't since we can't depend on specific broker configs for those tests.
          Hide
          Mark Gellings added a comment -

          Oops. Nope. When I uncomment the auto failover, then I see the same behavior as with the NMS test, except it gets to only 40 messages.

          run:
          Durable Consumer Failover Test Started:
          Dec 15, 2010 2:22:45 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect
          INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true
          Dec 15, 2010 2:22:46 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect
          INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true
          Produced message 10
          Produced message 20
          Produced message 30
          Produced message 40
          Produced message 50
          Produced message 60
          Produced message 70
          Produced message 80
          Produced message 90
          Produced message 100
          Dec 15, 2010 2:22:46 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect
          INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true
          Consumer: At message 10
          Consumer: At message 20
          Consumer: At message 30
          Consumer: At message 40

              • Failoving broker connection now! ***
                Dec 15, 2010 2:22:50 PM org.apache.activemq.transport.failover.FailoverTransport handleTransportFailure
                WARNING: Transport (/127.0.0.1:61616) failed to tcp://127.0.0.1:61616?trace=true , attempting to automatically reconnect due to: java.net.SocketException: socket closed
                Dec 15, 2010 2:22:51 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect
                INFO: Successfully reconnected to tcp://QG076619:61616
                OnException: Could not correlate acknowledgment with dispatched message: MessageAck {commandId = 8, responseRequired = false, ackType = 0, consumerId = ID:QG076619-60372-1292444565609-0:3:1:1, firstMessageId = ID:QG076619-60372-1292444565609-0:2:1:1:31, lastMessageId = ID:QG076619-60372-1292444565609-0:2:1:1:40, destination = topic://DurableConsumerTest-TopicId, transactionId = null, messageCount = 10}

                Dec 15, 2010 2:23:46 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect
                INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true
                Messages consumed were 40, not 100 as expectedDurable Consumer Failover Test Complete:
                BUILD SUCCESSFUL (total time: 1 minute 2 seconds)

          Show
          Mark Gellings added a comment - Oops. Nope. When I uncomment the auto failover, then I see the same behavior as with the NMS test, except it gets to only 40 messages. run: Durable Consumer Failover Test Started: Dec 15, 2010 2:22:45 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true Dec 15, 2010 2:22:46 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true Produced message 10 Produced message 20 Produced message 30 Produced message 40 Produced message 50 Produced message 60 Produced message 70 Produced message 80 Produced message 90 Produced message 100 Dec 15, 2010 2:22:46 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true Consumer: At message 10 Consumer: At message 20 Consumer: At message 30 Consumer: At message 40 Failoving broker connection now! *** Dec 15, 2010 2:22:50 PM org.apache.activemq.transport.failover.FailoverTransport handleTransportFailure WARNING: Transport (/127.0.0.1:61616) failed to tcp://127.0.0.1:61616?trace=true , attempting to automatically reconnect due to: java.net.SocketException: socket closed Dec 15, 2010 2:22:51 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect INFO: Successfully reconnected to tcp://QG076619:61616 OnException: Could not correlate acknowledgment with dispatched message: MessageAck {commandId = 8, responseRequired = false, ackType = 0, consumerId = ID:QG076619-60372-1292444565609-0:3:1:1, firstMessageId = ID:QG076619-60372-1292444565609-0:2:1:1:31, lastMessageId = ID:QG076619-60372-1292444565609-0:2:1:1:40, destination = topic://DurableConsumerTest-TopicId, transactionId = null, messageCount = 10} Dec 15, 2010 2:23:46 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect INFO: Successfully connected to tcp://127.0.0.1:61616?trace=true Messages consumed were 40, not 100 as expectedDurable Consumer Failover Test Complete: BUILD SUCCESSFUL (total time: 1 minute 2 seconds)
          Hide
          Mark Gellings added a comment -

          I tested this on a Windows XP SP3 machine at home. So not on our company domain in order to rule out my Windows 7 macihne at work. With a canned ActiveMQ 5.4.2 broker and your test with automated failover running on this XP machine, I get the same result, gets to 50 messages and then stops processing.

          One thing to note is I was not running pure master/slave, so for my failover URI the tcp addresses were the same, both localhost:61616. But I don't think that would make a difference at all.

          If you could look at the log files I provided Tim maybe you'll see why this isn't working for me. I don't understand why both the java and .net consumers stop processing messages for me but not for you.

          Thanks!

          Show
          Mark Gellings added a comment - I tested this on a Windows XP SP3 machine at home. So not on our company domain in order to rule out my Windows 7 macihne at work. With a canned ActiveMQ 5.4.2 broker and your test with automated failover running on this XP machine, I get the same result, gets to 50 messages and then stops processing. One thing to note is I was not running pure master/slave, so for my failover URI the tcp addresses were the same, both localhost:61616. But I don't think that would make a difference at all. If you could look at the log files I provided Tim maybe you'll see why this isn't working for me. I don't understand why both the java and .net consumers stop processing messages for me but not for you. Thanks!
          Hide
          Mark Gellings added a comment -

          For completion's sake here is what the broker spits out when running on XP:

          INFO | Started SelectChannelConnector@0.0.0.0:8161
          INFO | Transport failed: java.io.EOFException
          WARN | Async error occurred: javax.jms.JMSException: Unmatched acknowledge: Mes
          sageAck

          {commandId = 53, responseRequired = false, ackType = 2, consumerId = ID: msie6-1158-634280387557703303-0:2:1:1, firstMessageId = ID:msie6-1158-6342803875 57703303-0:1:1:1:50, lastMessageId = ID:msie6-1158-634280387557703303-0:1:1:1:50 , destination = topic://DurableConsumerTest-TopicId, transactionId = null, messa geCount = 1}

          ; Could not find Message-ID ID:msie6-1158-634280387557703303-0:1:1:1
          :50 in dispatched-list (start of ack)
          javax.jms.JMSException: Unmatched acknowledge: MessageAck

          {commandId = 53, respo nseRequired = false, ackType = 2, consumerId = ID:msie6-1158-634280387557703303- 0:2:1:1, firstMessageId = ID:msie6-1158-634280387557703303-0:1:1:1:50, lastMessa geId = ID:msie6-1158-634280387557703303-0:1:1:1:50, destination = topic://Durabl eConsumerTest-TopicId, transactionId = null, messageCount = 1}

          ; Could not find M
          essage-ID ID:msie6-1158-634280387557703303-0:1:1:1:50 in dispatched-list (start
          of ack)
          at org.apache.activemq.broker.region.PrefetchSubscription.assertAckMatch
          esDispatched(PrefetchSubscription.java:436)
          at org.apache.activemq.broker.region.PrefetchSubscription.acknowledge(Pr
          efetchSubscription.java:207)
          at org.apache.activemq.broker.region.AbstractRegion.acknowledge(Abstract
          Region.java:383)
          at org.apache.activemq.broker.region.RegionBroker.acknowledge(RegionBrok
          er.java:544)
          at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java
          :77)
          at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java
          :77)
          at org.apache.activemq.broker.TransactionBroker.acknowledge(TransactionB
          roker.java:197)
          at org.apache.activemq.broker.MutableBrokerFilter.acknowledge(MutableBro
          kerFilter.java:87)
          at org.apache.activemq.broker.TransportConnection.processMessageAck(Tran
          sportConnection.java:469)
          at org.apache.activemq.command.MessageAck.visit(MessageAck.java:214)
          at org.apache.activemq.broker.TransportConnection.service(TransportConne
          ction.java:311)
          at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportC
          onnection.java:185)
          at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilt
          er.java:69)
          at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireForm
          atNegotiator.java:113)
          at org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityM
          onitor.java:228)
          at org.apache.activemq.transport.TransportSupport.doConsume(TransportSup
          port.java:83)
          at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.jav
          a:220)
          at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:
          202)
          at java.lang.Thread.run(Unknown Source)
          WARN | Failed to register MBean: org.apache.activemq:BrokerName=localhost,Type=
          Subscription,persistentMode=Durable,subscriptionID=DurableConsumerTest-Subscript
          ionId,destinationType=Topic,destinationName=DurableConsumerTest-TopicId,clientId
          =DurableConsumerTest-ClientId
          INFO | Transport failed: java.io.EOFException
          WARN | Async error occurred: javax.jms.JMSException: Unmatched acknowledge: Mes
          sageAck

          {commandId = 53, responseRequired = false, ackType = 2, consumerId = ID: msie6-1166-634280389518422679-0:2:1:1, firstMessageId = ID:msie6-1166-6342803895 18422679-0:1:1:1:50, lastMessageId = ID:msie6-1166-634280389518422679-0:1:1:1:50 , destination = topic://DurableConsumerTest-TopicId, transactionId = null, messa geCount = 1}

          ; Could not find Message-ID ID:msie6-1166-634280389518422679-0:1:1:1
          :50 in dispatched-list (start of ack)
          javax.jms.JMSException: Unmatched acknowledge: MessageAck

          {commandId = 53, respo nseRequired = false, ackType = 2, consumerId = ID:msie6-1166-634280389518422679- 0:2:1:1, firstMessageId = ID:msie6-1166-634280389518422679-0:1:1:1:50, lastMessa geId = ID:msie6-1166-634280389518422679-0:1:1:1:50, destination = topic://Durabl eConsumerTest-TopicId, transactionId = null, messageCount = 1}

          ; Could not find M
          essage-ID ID:msie6-1166-634280389518422679-0:1:1:1:50 in dispatched-list (start
          of ack)
          at org.apache.activemq.broker.region.PrefetchSubscription.assertAckMatch
          esDispatched(PrefetchSubscription.java:436)
          at org.apache.activemq.broker.region.PrefetchSubscription.acknowledge(Pr
          efetchSubscription.java:207)
          at org.apache.activemq.broker.region.AbstractRegion.acknowledge(Abstract
          Region.java:383)
          at org.apache.activemq.broker.region.RegionBroker.acknowledge(RegionBrok
          er.java:544)
          at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java
          :77)
          at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java
          :77)
          at org.apache.activemq.broker.TransactionBroker.acknowledge(TransactionB
          roker.java:197)
          at org.apache.activemq.broker.MutableBrokerFilter.acknowledge(MutableBro
          kerFilter.java:87)
          at org.apache.activemq.broker.TransportConnection.processMessageAck(Tran
          sportConnection.java:469)
          at org.apache.activemq.command.MessageAck.visit(MessageAck.java:214)
          at org.apache.activemq.broker.TransportConnection.service(TransportConne
          ction.java:311)
          at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportC
          onnection.java:185)
          at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilt
          er.java:69)
          at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireForm
          atNegotiator.java:113)
          at org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityM
          onitor.java:228)
          at org.apache.activemq.transport.TransportSupport.doConsume(TransportSup
          port.java:83)
          at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.jav
          a:220)
          at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:
          202)
          at java.lang.Thread.run(Unknown Source)

          Show
          Mark Gellings added a comment - For completion's sake here is what the broker spits out when running on XP: INFO | Started SelectChannelConnector@0.0.0.0:8161 INFO | Transport failed: java.io.EOFException WARN | Async error occurred: javax.jms.JMSException: Unmatched acknowledge: Mes sageAck {commandId = 53, responseRequired = false, ackType = 2, consumerId = ID: msie6-1158-634280387557703303-0:2:1:1, firstMessageId = ID:msie6-1158-6342803875 57703303-0:1:1:1:50, lastMessageId = ID:msie6-1158-634280387557703303-0:1:1:1:50 , destination = topic://DurableConsumerTest-TopicId, transactionId = null, messa geCount = 1} ; Could not find Message-ID ID:msie6-1158-634280387557703303-0:1:1:1 :50 in dispatched-list (start of ack) javax.jms.JMSException: Unmatched acknowledge: MessageAck {commandId = 53, respo nseRequired = false, ackType = 2, consumerId = ID:msie6-1158-634280387557703303- 0:2:1:1, firstMessageId = ID:msie6-1158-634280387557703303-0:1:1:1:50, lastMessa geId = ID:msie6-1158-634280387557703303-0:1:1:1:50, destination = topic://Durabl eConsumerTest-TopicId, transactionId = null, messageCount = 1} ; Could not find M essage-ID ID:msie6-1158-634280387557703303-0:1:1:1:50 in dispatched-list (start of ack) at org.apache.activemq.broker.region.PrefetchSubscription.assertAckMatch esDispatched(PrefetchSubscription.java:436) at org.apache.activemq.broker.region.PrefetchSubscription.acknowledge(Pr efetchSubscription.java:207) at org.apache.activemq.broker.region.AbstractRegion.acknowledge(Abstract Region.java:383) at org.apache.activemq.broker.region.RegionBroker.acknowledge(RegionBrok er.java:544) at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java :77) at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java :77) at org.apache.activemq.broker.TransactionBroker.acknowledge(TransactionB roker.java:197) at org.apache.activemq.broker.MutableBrokerFilter.acknowledge(MutableBro kerFilter.java:87) at org.apache.activemq.broker.TransportConnection.processMessageAck(Tran sportConnection.java:469) at org.apache.activemq.command.MessageAck.visit(MessageAck.java:214) at org.apache.activemq.broker.TransportConnection.service(TransportConne ction.java:311) at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportC onnection.java:185) at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilt er.java:69) at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireForm atNegotiator.java:113) at org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityM onitor.java:228) at org.apache.activemq.transport.TransportSupport.doConsume(TransportSup port.java:83) at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.jav a:220) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java: 202) at java.lang.Thread.run(Unknown Source) WARN | Failed to register MBean: org.apache.activemq:BrokerName=localhost,Type= Subscription,persistentMode=Durable,subscriptionID=DurableConsumerTest-Subscript ionId,destinationType=Topic,destinationName=DurableConsumerTest-TopicId,clientId =DurableConsumerTest-ClientId INFO | Transport failed: java.io.EOFException WARN | Async error occurred: javax.jms.JMSException: Unmatched acknowledge: Mes sageAck {commandId = 53, responseRequired = false, ackType = 2, consumerId = ID: msie6-1166-634280389518422679-0:2:1:1, firstMessageId = ID:msie6-1166-6342803895 18422679-0:1:1:1:50, lastMessageId = ID:msie6-1166-634280389518422679-0:1:1:1:50 , destination = topic://DurableConsumerTest-TopicId, transactionId = null, messa geCount = 1} ; Could not find Message-ID ID:msie6-1166-634280389518422679-0:1:1:1 :50 in dispatched-list (start of ack) javax.jms.JMSException: Unmatched acknowledge: MessageAck {commandId = 53, respo nseRequired = false, ackType = 2, consumerId = ID:msie6-1166-634280389518422679- 0:2:1:1, firstMessageId = ID:msie6-1166-634280389518422679-0:1:1:1:50, lastMessa geId = ID:msie6-1166-634280389518422679-0:1:1:1:50, destination = topic://Durabl eConsumerTest-TopicId, transactionId = null, messageCount = 1} ; Could not find M essage-ID ID:msie6-1166-634280389518422679-0:1:1:1:50 in dispatched-list (start of ack) at org.apache.activemq.broker.region.PrefetchSubscription.assertAckMatch esDispatched(PrefetchSubscription.java:436) at org.apache.activemq.broker.region.PrefetchSubscription.acknowledge(Pr efetchSubscription.java:207) at org.apache.activemq.broker.region.AbstractRegion.acknowledge(Abstract Region.java:383) at org.apache.activemq.broker.region.RegionBroker.acknowledge(RegionBrok er.java:544) at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java :77) at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java :77) at org.apache.activemq.broker.TransactionBroker.acknowledge(TransactionB roker.java:197) at org.apache.activemq.broker.MutableBrokerFilter.acknowledge(MutableBro kerFilter.java:87) at org.apache.activemq.broker.TransportConnection.processMessageAck(Tran sportConnection.java:469) at org.apache.activemq.command.MessageAck.visit(MessageAck.java:214) at org.apache.activemq.broker.TransportConnection.service(TransportConne ction.java:311) at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportC onnection.java:185) at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilt er.java:69) at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireForm atNegotiator.java:113) at org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityM onitor.java:228) at org.apache.activemq.transport.TransportSupport.doConsume(TransportSup port.java:83) at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.jav a:220) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java: 202) at java.lang.Thread.run(Unknown Source)
          Hide
          Timothy Bish added a comment -

          So now you are seeing exactly what I see, .NET and Java behave the same, on my end I see both .NET and Java client stop receiving after the failover indicating that the problem is not on the client end.

          Show
          Timothy Bish added a comment - So now you are seeing exactly what I see, .NET and Java behave the same, on my end I see both .NET and Java client stop receiving after the failover indicating that the problem is not on the client end.
          Hide
          Mark Gellings added a comment -

          Okay. I had thought things were working fine for you...in AMQNET-293 you had said "I haven't been able to reproduce any sort of deadlock using latest trunk and the provided tests in AMQNET-294". So good to know you are actually seeing the problem on your end too.

          I did test with ActiveMQ v5.3.2 and the same problem occurred.

          It does appear to be something in combination of the clients and broker end as the tagged 1.1, 1.2, and 1.3 versions of NMS do not have this problem when running against the ActiveMQ v5.4.2 broker.

          Show
          Mark Gellings added a comment - Okay. I had thought things were working fine for you...in AMQNET-293 you had said "I haven't been able to reproduce any sort of deadlock using latest trunk and the provided tests in AMQNET-294 ". So good to know you are actually seeing the problem on your end too. I did test with ActiveMQ v5.3.2 and the same problem occurred. It does appear to be something in combination of the clients and broker end as the tagged 1.1, 1.2, and 1.3 versions of NMS do not have this problem when running against the ActiveMQ v5.4.2 broker.
          Hide
          Timothy Bish added a comment -

          Sorry, I was interpreting the use of deadlock as an actual deadlock in the client and not just the client not receiving any more messages.

          I think the older NMS clients weren't clearing out their in progress message list correctly so you probably where getting the old messages delivered. You might see unmatched message acks on the broker with the older ones.

          Show
          Timothy Bish added a comment - Sorry, I was interpreting the use of deadlock as an actual deadlock in the client and not just the client not receiving any more messages. I think the older NMS clients weren't clearing out their in progress message list correctly so you probably where getting the old messages delivered. You might see unmatched message acks on the broker with the older ones.
          Hide
          Mark Gellings added a comment - - edited

          When testing with the older NMS clients all the messages come through and the NUNit test passes. This is even with just a prefetch of 5 in which all the messages should not have been sent to the client. So the test gets to 50 messages, fails over, and then message 50 and above come through. The test only adds messages to the received message list if the id of the message is different, which asserts that all the unique messages came through.

          short id = Convert.ToInt16(((ActiveMQTextMessage) message).Text);
          if(!consumedMessageList.Contains(id))

          { consumedMessageList.Add(id); }

          Maybe the problem is due to a newer version of the OpenWire protocol being used? Should an issue be created for the java client too, or created for the broker?

          What is your opinion of rolling with NMS 1.3 and ActiveMQ 5.4.2 instead of NMS 1.4.1? We're moving to a later version of ActiveMQ due to bug https://issues.apache.org/jira/browse/AMQ-2082. We have an idempotent message pattern built into our framework resting on top of NMS to detect duplicate messages and send them through as redelivered so the clearing out of the in progress message list may not be of concern.

          Show
          Mark Gellings added a comment - - edited When testing with the older NMS clients all the messages come through and the NUNit test passes. This is even with just a prefetch of 5 in which all the messages should not have been sent to the client. So the test gets to 50 messages, fails over, and then message 50 and above come through. The test only adds messages to the received message list if the id of the message is different, which asserts that all the unique messages came through. short id = Convert.ToInt16(((ActiveMQTextMessage) message).Text); if(!consumedMessageList.Contains(id)) { consumedMessageList.Add(id); } Maybe the problem is due to a newer version of the OpenWire protocol being used? Should an issue be created for the java client too, or created for the broker? What is your opinion of rolling with NMS 1.3 and ActiveMQ 5.4.2 instead of NMS 1.4.1? We're moving to a later version of ActiveMQ due to bug https://issues.apache.org/jira/browse/AMQ-2082 . We have an idempotent message pattern built into our framework resting on top of NMS to detect duplicate messages and send them through as redelivered so the clearing out of the in progress message list may not be of concern.
          Hide
          Timothy Bish added a comment -

          Certainly is interesting. Have you tried using the java test case with the activemq jars from 5.3 against a 5.4.2 broker, would be an interesting experiment. I don't remember what all changed from openwire 5 to 6 but I don't think it was all that much.

          Will have to run some tests to capture the openwire command flow with different versions.

          Show
          Timothy Bish added a comment - Certainly is interesting. Have you tried using the java test case with the activemq jars from 5.3 against a 5.4.2 broker, would be an interesting experiment. I don't remember what all changed from openwire 5 to 6 but I don't think it was all that much. Will have to run some tests to capture the openwire command flow with different versions.
          Hide
          Mark Gellings added a comment -

          I can try running the java test case with the older jars.

          What is your opinion of rolling with NMS 1.3 and ActiveMQ 5.4.2 instead of NMS 1.4.1?

          Show
          Mark Gellings added a comment - I can try running the java test case with the older jars. What is your opinion of rolling with NMS 1.3 and ActiveMQ 5.4.2 instead of NMS 1.4.1?
          Hide
          Timothy Bish added a comment -

          Kind of depends on what your needs are, there's a bunch of bugs in the NMS 1.3.0 Uri handling so if you need to set Uri options especially on transports inside the failover transport Uri, that won't work so well. If you want Message priority support then you'd need 1.4+.

          Basically look at the issues in the release and see if anything affects your app.

          Show
          Timothy Bish added a comment - Kind of depends on what your needs are, there's a bunch of bugs in the NMS 1.3.0 Uri handling so if you need to set Uri options especially on transports inside the failover transport Uri, that won't work so well. If you want Message priority support then you'd need 1.4+. Basically look at the issues in the release and see if anything affects your app.
          Hide
          Mark Gellings added a comment -

          Tim do you have any idea when you will be able to look into this more?

          Show
          Mark Gellings added a comment - Tim do you have any idea when you will be able to look into this more?
          Hide
          Timothy Bish added a comment -

          I played with it some this morning. The Java tests will receive all messages if you run with 5.3.2 jars, and not receive when run with 5.4.2 jars, so it behaves the same as .NET. It appears that something changes when the client registers with the prefetch set to zero and then restores it with ConsumerControl. Haven't had time to dig much deeper yet.

          Show
          Timothy Bish added a comment - I played with it some this morning. The Java tests will receive all messages if you run with 5.3.2 jars, and not receive when run with 5.4.2 jars, so it behaves the same as .NET. It appears that something changes when the client registers with the prefetch set to zero and then restores it with ConsumerControl. Haven't had time to dig much deeper yet.
          Hide
          Timothy Bish added a comment -

          Have a couple other tasks ahead of this in the Queue, will get back to it tomorrow probably.

          Show
          Timothy Bish added a comment - Have a couple other tasks ahead of this in the Queue, will get back to it tomorrow probably.
          Hide
          Timothy Bish added a comment -

          Add a small fix that addresses part of the problem to trunk. Still possible to the stray MessageAck to make it over to the broker if the disconnect / connect happens very quickly at the right time.

          Show
          Timothy Bish added a comment - Add a small fix that addresses part of the problem to trunk. Still possible to the stray MessageAck to make it over to the broker if the disconnect / connect happens very quickly at the right time.
          Hide
          Mark Gellings added a comment -

          Tim I'm currently testing NMS 1.3 for our needs. It sounds like the permanent fix for this is close. I'm in a crunch to make the call as to whether we go with NMS 1.3 or not. We are stress testing significantly with 500,000+ message runs, restarts of brokers/consumers, topics, queues, networks, etc. If we can get the fix for this done ASAP I can make the switch and try and sign off on 1.4.1. I think this would be a win/win as we get to use the latest version and significant testing is done for the community.

          Thanks and let me know your thoughts!

          Show
          Mark Gellings added a comment - Tim I'm currently testing NMS 1.3 for our needs. It sounds like the permanent fix for this is close. I'm in a crunch to make the call as to whether we go with NMS 1.3 or not. We are stress testing significantly with 500,000+ message runs, restarts of brokers/consumers, topics, queues, networks, etc. If we can get the fix for this done ASAP I can make the switch and try and sign off on 1.4.1. I think this would be a win/win as we get to use the latest version and significant testing is done for the community. Thanks and let me know your thoughts!
          Hide
          Timothy Bish added a comment -

          You are welcome to take what's in trunk right now and do some testing. There is still an edge case that allows a stale messageAck to get to the broker which is the root of the problem in both Java and .Net clients. The fix that's in there now will catch some of them, but I still need to figure out a way to get the reset without impacting anything else. If 1.4.1 does everything else you need it to then I'd go with that, the way that 1.3 is doing the failover can cause issues inside of transacted sessions such as missing messages so it has its own set of problems.

          This should all be smoothed out in 1.5 which I hope to release sometime in January once I get the Distributed TX stuff ironed out.

          Show
          Timothy Bish added a comment - You are welcome to take what's in trunk right now and do some testing. There is still an edge case that allows a stale messageAck to get to the broker which is the root of the problem in both Java and .Net clients. The fix that's in there now will catch some of them, but I still need to figure out a way to get the reset without impacting anything else. If 1.4.1 does everything else you need it to then I'd go with that, the way that 1.3 is doing the failover can cause issues inside of transacted sessions such as missing messages so it has its own set of problems. This should all be smoothed out in 1.5 which I hope to release sometime in January once I get the Distributed TX stuff ironed out.
          Hide
          Timothy Bish added a comment -

          One way to deal with the Durable Consumers right now is to listen for the transport interrupted event and close them down and recreate them, then they will get all the backlogged messages, if you add a second run of the consumer to the test case that doesn't fail you'll see all its remained unacked messages come in normally.

          Show
          Timothy Bish added a comment - One way to deal with the Durable Consumers right now is to listen for the transport interrupted event and close them down and recreate them, then they will get all the backlogged messages, if you add a second run of the consumer to the test case that doesn't fail you'll see all its remained unacked messages come in normally.
          Hide
          Mark Gellings added a comment -

          I can certainly listen for transport interrupted events? Can you reply with a quick code example of how to do this and I will test it out.

          We generally do not use transacted sessions so going with NMS 1.3 would be okay.

          Show
          Mark Gellings added a comment - I can certainly listen for transport interrupted events? Can you reply with a quick code example of how to do this and I will test it out. We generally do not use transacted sessions so going with NMS 1.3 would be okay.
          Hide
          Timothy Bish added a comment -

          Here's a simple consumer that listens to the events, you just need to code up the close of the MessageConsumer and recreate it in the callback.

          
          using System;
          using System.Threading;
          using System.Text;
          using System.Net;
          using Apache.NMS;
          using Apache.NMS.ActiveMQ;
          using Apache.NMS.ActiveMQ.Commands;
          
          namespace SimpleConsumer
          {       
              public class SimpleConsumer
              {
                  private IConnection connection;
                  private ISession session;
                  private IMessageConsumer consumer;
                  private IDestination destination;
                  
                  public SimpleConsumer()
                  {
                  }
          
                  public void Connect()
                  {
                      Apache.NMS.ActiveMQ.ConnectionFactory factory = new ConnectionFactory(
                          "failover:(tcp://localhost:61616?keepAlive=true&wireFormat.maxInactivityDuration=300000&wireFormat.tcpNoDelayEnabled=true)?initialReconnectDelay=100&randomize=false&timeout=1500");
                      connection = factory.CreateConnection();
                      connection.ConnectionInterruptedListener += new ConnectionInterruptedListener(OnTransportInterrupted);
                      connection.ConnectionResumedListener += new ConnectionResumedListener(OnTransportResumed);
                      session = connection.CreateSession(AcknowledgementMode.AutoAcknowledge);
                      destination = session.GetQueue("TEST-FOO");
                      consumer = session.CreateConsumer(destination);
                      consumer.Listener += new MessageListener(OnMessage);
                  }
          
                  protected void OnTransportInterrupted()
                  {
                      Console.WriteLine("***** Connection Reports the Transport Was Interrupted. *****");
                  }
          
                  protected void OnTransportResumed()
                  {
                      Console.WriteLine("***** Connection Reports the Transport Was Restored. *****");
                  }
          
                  protected void OnMessage(IMessage message)
                  {
                      bool wasCompressed = (message as ActiveMQMessage).Compressed;
                      string compressedState = wasCompressed ? "Compressed" : "Uncompressed";
          
                      Console.WriteLine("OnMessage Called with a {0} message of Type: {1}",
                                        compressedState, message.GetType().Name);
          
                      if(message is ITextMessage)
                      {
                          ITextMessage textMsg = message as ITextMessage;
                          Console.WriteLine("Message was ITextMessage, value of Text = " + textMsg.Text);
                      }
                  }
          
                  public void Close()
                  {
                      consumer.Close();
                      session.Close();
                      connection.Close();
                  }
                  
                  public void Run()
                  {
                      connection.Start();
          
                      while(!Console.KeyAvailable)
                      {
                          Thread.Sleep(1000);
                      }
                  }
              }
          }
          
          
          
          Show
          Timothy Bish added a comment - Here's a simple consumer that listens to the events, you just need to code up the close of the MessageConsumer and recreate it in the callback. using System; using System.Threading; using System.Text; using System.Net; using Apache.NMS; using Apache.NMS.ActiveMQ; using Apache.NMS.ActiveMQ.Commands; namespace SimpleConsumer { public class SimpleConsumer { private IConnection connection; private ISession session; private IMessageConsumer consumer; private IDestination destination; public SimpleConsumer() { } public void Connect() { Apache.NMS.ActiveMQ.ConnectionFactory factory = new ConnectionFactory( "failover:(tcp://localhost:61616?keepAlive=true&wireFormat.maxInactivityDuration=300000&wireFormat.tcpNoDelayEnabled=true)?initialReconnectDelay=100&randomize=false&timeout=1500"); connection = factory.CreateConnection(); connection.ConnectionInterruptedListener += new ConnectionInterruptedListener(OnTransportInterrupted); connection.ConnectionResumedListener += new ConnectionResumedListener(OnTransportResumed); session = connection.CreateSession(AcknowledgementMode.AutoAcknowledge); destination = session.GetQueue("TEST-FOO"); consumer = session.CreateConsumer(destination); consumer.Listener += new MessageListener(OnMessage); } protected void OnTransportInterrupted() { Console.WriteLine("***** Connection Reports the Transport Was Interrupted. *****"); } protected void OnTransportResumed() { Console.WriteLine("***** Connection Reports the Transport Was Restored. *****"); } protected void OnMessage(IMessage message) { bool wasCompressed = (message as ActiveMQMessage).Compressed; string compressedState = wasCompressed ? "Compressed" : "Uncompressed"; Console.WriteLine("OnMessage Called with a {0} message of Type: {1}", compressedState, message.GetType().Name); if(message is ITextMessage) { ITextMessage textMsg = message as ITextMessage; Console.WriteLine("Message was ITextMessage, value of Text = " + textMsg.Text); } } public void Close() { consumer.Close(); session.Close(); connection.Close(); } public void Run() { connection.Start(); while(!Console.KeyAvailable) { Thread.Sleep(1000); } } } }
          Hide
          Timothy Bish added a comment -

          This appears to be a problem with the broker recovering the subscription. Opened AMQ-3106 to track the problem.

          Show
          Timothy Bish added a comment - This appears to be a problem with the broker recovering the subscription. Opened AMQ-3106 to track the problem.
          Hide
          Timothy Bish added a comment -

          Marking fix version as unknown as this is really a broker issue and won't be fixed until AMQ-3106 is fixed.

          Show
          Timothy Bish added a comment - Marking fix version as unknown as this is really a broker issue and won't be fixed until AMQ-3106 is fixed.
          Hide
          Jim Gomes added a comment -

          As near as I can tell, the remaining problem is not a .NET client issue that we can fix. If there are no objections, I'd like to resolve this issue out of the AMQNET database and have the issue tracked via the related ActiveMQ issue.

          Show
          Jim Gomes added a comment - As near as I can tell, the remaining problem is not a .NET client issue that we can fix. If there are no objections, I'd like to resolve this issue out of the AMQNET database and have the issue tracked via the related ActiveMQ issue.
          Hide
          Timothy Bish added a comment -

          This issue is tracked by AMQ-3106 and is not fixable on the .NET client side. I'm closing this out as there isn't anything more we can do here.

          Show
          Timothy Bish added a comment - This issue is tracked by AMQ-3106 and is not fixable on the .NET client side. I'm closing this out as there isn't anything more we can do here.

            People

            • Assignee:
              Jim Gomes
              Reporter:
              Mark Gellings
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development