ActiveMQ
  1. ActiveMQ
  2. AMQ-2009

Problem with message dispatch after a while

    Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 5.1.0, 5.2.0
    • Fix Version/s: NEEDS_REVIEW
    • Component/s: Broker
    • Labels:
      None

      Description

      Messages are not getting dispatched after a while (although it accepts new incoming messages) until restart of the broker. This problem is described in several posts.

      http://www.nabble.com/Pending-Messages-are-shown-in-ActiveMQ-td20241332.html

      http://www.nabble.com/Consumer-Listener-stop-receving-message-until-ActiveMQ-restart-td20355247.html

      http://www.nabble.com/Stuck-messages---Dispatch-issues-td20467949.html

      There was also an issue opened in Spring project for this thinking it was Spring problem.

      http://jira.springframework.org/browse/SPR-5110

      I am not able to reproduce with Junit test case having BrokerService started with in the test case. I guess I am not hitting the right stress conditions this way. But when I run the test case against an externally running ActiveMQ instance backed with oracle database persistence, it is reproducible most of the times. This is not a every time failure situation, it takes more time once than the other.

      I was able to hit this situation of stuck messages on queue using following scenario most of the times:

      1) Start 2 concurrent consumers for the queue using Spring's DefaultMessageListenerContainer using cacheLevelName as CACHE_CONSUMER
      2) Send messages using JMETER 2.3.2 to the queue on ActiveMQ stand alone broker instance with 50 threads looping 20 times.
      3) After a while, you will notice that Spring logs that no messages are being received but the messages are shown jconsole of ActiveMQ and the database backing it for persistence.

      But in 5.2 RC3, the problem is that it dispatches duplicate messages and does not remove them from broker's database after acknowledge properly.

      Attached test case might help to reproduce when run against externally running stand alone ActiveMQ broker. Another way to see the problem is that try to load test using JMETER by sending messages to a queue with a camel route that moves messages from this queue to another and you will notice that it stops moving after while or copied duplicates in case of 5.2 RC3.

      Sorry about such a huge description but it is a real problem! A different team at our company are having this issue in production with 5.1. They are using it as an embedded broker with derby for persistence.

      1. AMQ-2009Testcase2.zip
        7 kB
        Torsten Mielke
      2. consumertest.zip
        3 kB
        Brecht Yperman
      3. DispatchMultipleConsumersTest.java
        6 kB
        Rajani Chennamaneni
      4. JConsole-screenshot.jpg
        105 kB
        Maurits Lucas
      5. testcase.zip
        9 kB
        Torsten Mielke

        Issue Links

          Activity

          Hide
          Eduardo Teodoro Silva Júnior added a comment -

          I suspect that this problem is related to storage limit. If you set storeUsage limit to 5GB, you can easily reproduce this problem producing about 50000 items without consumer.
          When the consumer start, it don't receive any message.

          <storeUsage>
          <storeUsage limit="5 gb"/>
          </storeUsage>
          <tempUsage>
          <tempUsage limit="1 gb"/>
          </tempUsage>

          Show
          Eduardo Teodoro Silva Júnior added a comment - I suspect that this problem is related to storage limit. If you set storeUsage limit to 5GB, you can easily reproduce this problem producing about 50000 items without consumer. When the consumer start, it don't receive any message. <storeUsage> <storeUsage limit="5 gb"/> </storeUsage> <tempUsage> <tempUsage limit="1 gb"/> </tempUsage>
          Hide
          Rural Hunter added a comment -

          But if there are new messages enqueued, the old stuck message can be consumed. It seems there is a threshold that only it's reached activemq will dispatch the message.

          Show
          Rural Hunter added a comment - But if there are new messages enqueued, the old stuck message can be consumed. It seems there is a threshold that only it's reached activemq will dispatch the message.
          Hide
          Rural Hunter added a comment -

          Me too with 5.9.0. I always see several messages(usually less than 5) stuck in the queue while there are active consumers. My consumers are implemented by MessageListener. This is not a problem only caused by multi-consumers/providers. I have seen this on a queue with only one provider and one consumer. For example, my latest case: the queue is enqueued with only 3 messages total and 2 are consumed but one seems left in the queue forever. Sometimes the message can be consumed by restarting the consumer but sometime it doesn't help.

          Show
          Rural Hunter added a comment - Me too with 5.9.0. I always see several messages(usually less than 5) stuck in the queue while there are active consumers. My consumers are implemented by MessageListener. This is not a problem only caused by multi-consumers/providers. I have seen this on a queue with only one provider and one consumer. For example, my latest case: the queue is enqueued with only 3 messages total and 2 are consumed but one seems left in the queue forever. Sometimes the message can be consumed by restarting the consumer but sometime it doesn't help.
          Hide
          Eduardo Teodoro Silva Júnior added a comment -

          I´m facing the same problem with version 5.9.0.

          Show
          Eduardo Teodoro Silva Júnior added a comment - I´m facing the same problem with version 5.9.0.
          Hide
          Timothy Bish added a comment -

          If you are seeing an issue then the best thing to do is attach a JUnit test case that can reproduce the issue.

          Show
          Timothy Bish added a comment - If you are seeing an issue then the best thing to do is attach a JUnit test case that can reproduce the issue.
          Hide
          Kathyayani Prasad added a comment -

          Has this issue been resolved yet? I am using ActiveMQ v5.6 and still facing the same issue. Any inputs on this will be very helpful!

          Show
          Kathyayani Prasad added a comment - Has this issue been resolved yet? I am using ActiveMQ v5.6 and still facing the same issue. Any inputs on this will be very helpful!
          Hide
          Raphael Olszewski added a comment - - edited

          I'll try, but it will be hard, because it is usually only occuring while working on many mesages. And TRACE-Level is generating really a huge amount of data.
          But i'll try to catch one with this TRACE-Level-options you suggestet.

          Show
          Raphael Olszewski added a comment - - edited I'll try, but it will be hard, because it is usually only occuring while working on many mesages. And TRACE-Level is generating really a huge amount of data. But i'll try to catch one with this TRACE-Level-options you suggestet.
          Hide
          Brecht Yperman added a comment -

          I am out of the office and will have limited access to my email.

          I will be back on Nov/21/11.

          Please contact Vincent Van der Linden (vincent.vanderlinden@invenso.com) for urgent matters

          Show
          Brecht Yperman added a comment - I am out of the office and will have limited access to my email. I will be back on Nov/21/11. Please contact Vincent Van der Linden (vincent.vanderlinden@invenso.com) for urgent matters
          Hide
          Gary Tully added a comment -

          can you try and reproduce with additional logging, TRACE level logging for:

          org.apache.activemq.broker.TransportConnection
          org.apache.activemq.broker.region.cursors
          org.apache.activemq.broker.region.Queue
          org.apache.activemq.broker.region.PrefetchSubscription

          and also check the client logs for any warn level messages.
          Then if you can identify the message id of the stuck message so we can correlate with the additional logging we may be able to determine what is going on.

          Show
          Gary Tully added a comment - can you try and reproduce with additional logging, TRACE level logging for: org.apache.activemq.broker.TransportConnection org.apache.activemq.broker.region.cursors org.apache.activemq.broker.region.Queue org.apache.activemq.broker.region.PrefetchSubscription and also check the client logs for any warn level messages. Then if you can identify the message id of the stuck message so we can correlate with the additional logging we may be able to determine what is going on.
          Hide
          Raphael Olszewski added a comment - - edited

          We have similar Problems.
          We connect with about 50 consumers to one queue. Few messages get stuck in the queue after a while. I can not see any issue in amq.log.
          This messages stay in the queue until restart. Another workaround seems to be moving this messages to another queue and moving back - then this messages will be consumed immediatly.
          But the workaround of moving out and in again to this queue lead to negativ "Number Of Pending Messages" - which seems to be another bug.
          And we have this problems since AMQ 5.1.0 until now with AMQ 5.5.1

          Show
          Raphael Olszewski added a comment - - edited We have similar Problems. We connect with about 50 consumers to one queue. Few messages get stuck in the queue after a while. I can not see any issue in amq.log. This messages stay in the queue until restart. Another workaround seems to be moving this messages to another queue and moving back - then this messages will be consumed immediatly. But the workaround of moving out and in again to this queue lead to negativ "Number Of Pending Messages" - which seems to be another bug. And we have this problems since AMQ 5.1.0 until now with AMQ 5.5.1
          Hide
          Nageswara P Nakarikanti added a comment -


          A similar issue reported by me FYI :https://issues.apache.org/jira/browse/AMQ-1751

          Show
          Nageswara P Nakarikanti added a comment - A similar issue reported by me FYI : https://issues.apache.org/jira/browse/AMQ-1751
          Hide
          Rob added a comment -

          We did indeed have horrible problems where hundreds of thousands of messages would be pumping along all week... and maybe once or twice a week.. a particular queue/topic would lock up. The rest of the queues and topics would be okay... but the one would lock up. We went through about three months trying to work it from our side knowing that 5.1.0 had many suspicious issues. We however are in production and upgrading was forbidden. When we finally did upgrade... the problem which got us for months disappear and hasn't shown back up in say 6 months now. So in our case... 5.1.0 -> 5.3 was a life saver.

          Show
          Rob added a comment - We did indeed have horrible problems where hundreds of thousands of messages would be pumping along all week... and maybe once or twice a week.. a particular queue/topic would lock up. The rest of the queues and topics would be okay... but the one would lock up. We went through about three months trying to work it from our side knowing that 5.1.0 had many suspicious issues. We however are in production and upgrading was forbidden. When we finally did upgrade... the problem which got us for months disappear and hasn't shown back up in say 6 months now. So in our case... 5.1.0 -> 5.3 was a life saver.
          Hide
          Pavel Moravec added a comment -

          Can somebody please comment whether upgrade to 5.3.1 helped in their case? As we are facing the same issue in general and prior deep investigation we plan to do the upgrade from 5.2 to 5.3.1. So a feedback whether (or in how many cases) the upgrade helps will be more than welcome.

          Show
          Pavel Moravec added a comment - Can somebody please comment whether upgrade to 5.3.1 helped in their case? As we are facing the same issue in general and prior deep investigation we plan to do the upgrade from 5.2 to 5.3.1. So a feedback whether (or in how many cases) the upgrade helps will be more than welcome.
          Hide
          SuoNayi added a comment -

          Gary Tully.
          With selectors,Why not try to add a temporary strategy that invokes to dispatch messages periodically if no better idea.

          Show
          SuoNayi added a comment - Gary Tully. With selectors,Why not try to add a temporary strategy that invokes to dispatch messages periodically if no better idea.
          Hide
          Gary Tully added a comment -

          For users of 5.2.0, please try 5.3.1 - there have been lots of related issues resolved. https://issues.apache.org/activemq/secure/ReleaseNote.jspa?projectId=10520&styleName=Html&version=12183

          Show
          Gary Tully added a comment - For users of 5.2.0, please try 5.3.1 - there have been lots of related issues resolved. https://issues.apache.org/activemq/secure/ReleaseNote.jspa?projectId=10520&styleName=Html&version=12183
          Hide
          Gary Tully added a comment - - edited

          With selectors, if the a matching message does not fit in the maxPageSize for a queue, it will not be consumed till the messages ahead of it are consumed. With lots of sparse selectors this could lead to a hang.
          Making the maxPageSize larger than the default 200 will help with that. To make it 1000 for all queue destinations use:

                  <destinationPolicy>
                      <policyMap>
                          <policyEntries>
                              <policyEntry queue=">" maxPageSize="1000"/>
                          </policyEntries>
                      </policyMap>
                  </destinationPolicy>
          

          A more detailed explanation can be found http://trenaman.blogspot.com/2009/01/message-selectors-and-activemq.html

          Show
          Gary Tully added a comment - - edited With selectors, if the a matching message does not fit in the maxPageSize for a queue, it will not be consumed till the messages ahead of it are consumed. With lots of sparse selectors this could lead to a hang. Making the maxPageSize larger than the default 200 will help with that. To make it 1000 for all queue destinations use: <destinationPolicy> <policyMap> <policyEntries> <policyEntry queue= ">" maxPageSize= "1000" /> </policyEntries> </policyMap> </destinationPolicy> A more detailed explanation can be found http://trenaman.blogspot.com/2009/01/message-selectors-and-activemq.html
          Hide
          SuoNayi added a comment -

          Yes, I know about this and use the default value.But the producer is not fast as far as I know.
          This case can reproduce when the borker's keepDurableSubsActive is set to be false.
          It seems that the dispatching is blocked by FilePendingMessageCursor.

          Show
          SuoNayi added a comment - Yes, I know about this and use the default value.But the producer is not fast as far as I know. This case can reproduce when the borker's keepDurableSubsActive is set to be false. It seems that the dispatching is blocked by FilePendingMessageCursor.
          Hide
          ying added a comment -

          have you tried to set producerFlowControl to false ( the default is true and the broker just keep silent usually happening for fast producers and slow consumers and message will be piling up and not doing dispatch at all)

          <policyEntry queue=">" producerFlowControl="false">
          <dispatchPolicy>
          <roundRobinDispatchPolicy/>
          </dispatchPolicy>
          </policyEntry>

          Show
          ying added a comment - have you tried to set producerFlowControl to false ( the default is true and the broker just keep silent usually happening for fast producers and slow consumers and message will be piling up and not doing dispatch at all) <policyEntry queue=">" producerFlowControl="false"> <dispatchPolicy> <roundRobinDispatchPolicy/> </dispatchPolicy> </policyEntry>
          Hide
          John Newman added a comment -

          Is anyone using activemq in a very high volume situation and NOT running into this after a couple days? It seems like there should be a great amount of 'outrage' over this one, as it is quite disruptive. Wouldn't this be affecting just about everyone?

          We are using activemq 5.2 in production. 3 sites have lower volume and we don't really have any problems. One site has high volume, and after 2 or 3 days at most, it will just fall asleep. No errors, no out of memory or anything, logs remain totally healthy. The messages continue to make it into the activemq DB without problems, it grows and grows, but yet messages just stop going from activemq to the subscriber.

          because this is happening so regularly, the root problem isn't even understood, a fix is a long way away, and we can't "just restart" the broker in production without going through a bunch of hoops, we've deemed it not fit for production and now have to figure out a different queuing solution. =(

          Show
          John Newman added a comment - Is anyone using activemq in a very high volume situation and NOT running into this after a couple days? It seems like there should be a great amount of 'outrage' over this one, as it is quite disruptive. Wouldn't this be affecting just about everyone? We are using activemq 5.2 in production. 3 sites have lower volume and we don't really have any problems. One site has high volume, and after 2 or 3 days at most, it will just fall asleep. No errors, no out of memory or anything, logs remain totally healthy. The messages continue to make it into the activemq DB without problems, it grows and grows, but yet messages just stop going from activemq to the subscriber. because this is happening so regularly, the root problem isn't even understood, a fix is a long way away, and we can't "just restart" the broker in production without going through a bunch of hoops, we've deemed it not fit for production and now have to figure out a different queuing solution. =(
          Hide
          SuoNayi added a comment - - edited

          I encountered the same question three times.

          A standalone AMQ(5.2.0) server and three durable subscribers(because the bug with message selector) without using selector.

          It works fine usually(sending or receiving messages).

          A few days later, I use one of the subscribers to send a message successfully but the subscribers cannot receive any messages at all.

          Restart the subscriber, it does not work yet.
          Restart the AmqBroker, the subscribers receive the message.

          But I donot know what's wrong?
          I'm really crazy!
          Any help ?

          Show
          SuoNayi added a comment - - edited I encountered the same question three times. A standalone AMQ(5.2.0) server and three durable subscribers(because the bug with message selector) without using selector. It works fine usually(sending or receiving messages). A few days later, I use one of the subscribers to send a message successfully but the subscribers cannot receive any messages at all. Restart the subscriber, it does not work yet. Restart the AmqBroker, the subscribers receive the message. But I donot know what's wrong? I'm really crazy! Any help ?
          Hide
          Gary Tully added a comment -

          pushing this out to 5.4.0 as more information and analysis is needed. The most recent comments may point to a different issue altogether.

          Show
          Gary Tully added a comment - pushing this out to 5.4.0 as more information and analysis is needed. The most recent comments may point to a different issue altogether.
          Hide
          ying added a comment -

          I have a simple cause which can cause dispatch problem:

          1. setup a network of broker1, broker2, bridged by multicast discovery
          2. make a producer send 5 msg to queueA to broker2
          3. make a consumer to consume from broker1 queueA ( make it slow, so it only consumer 1 msg) but make sure all 5 msg from broker2 are forwared to broker1
          4. stop the consumer to broke1, restart it to consume from broker2 queueA
          5. the 4 msgs originally published to broker2 and forwarded to broker1 and has not yet been consumed will stuck on broker1 and will not forwarded to broker2 for the consumer to consume.

          You can only clear out the 4 left over stuck msg by making consumer consume from broker1 in this case.

          This is a very critical issue because you might restart your application many times and run into this.

          I will look into how the demandfowardbridge handles this case. if you know anything about this, please help. this is very urgent. thanks

          Show
          ying added a comment - I have a simple cause which can cause dispatch problem: 1. setup a network of broker1, broker2, bridged by multicast discovery 2. make a producer send 5 msg to queueA to broker2 3. make a consumer to consume from broker1 queueA ( make it slow, so it only consumer 1 msg) but make sure all 5 msg from broker2 are forwared to broker1 4. stop the consumer to broke1, restart it to consume from broker2 queueA 5. the 4 msgs originally published to broker2 and forwarded to broker1 and has not yet been consumed will stuck on broker1 and will not forwarded to broker2 for the consumer to consume. You can only clear out the 4 left over stuck msg by making consumer consume from broker1 in this case. This is a very critical issue because you might restart your application many times and run into this. I will look into how the demandfowardbridge handles this case. if you know anything about this, please help. this is very urgent. thanks
          Hide
          ying added a comment -

          I see this in Trunk 771718. Our network of 4 brokers are running for almost 30 days, then we see a broker is not dispatching msgs. It has about 400mb data in its data directory.

          We restart the application to talk to another broker, restart the problematic broker, it starts fine, bridged ok, sees the consumer on the other broker, but all the msgs stuck on this broker is not dispatched. They are stuck.

          I will try to gather more information when I get more idea. A simple unit test of the case is ok with less msgs. I suspect this happens when a lot of msgs are stuck on the queue, then it fails to dispatch even after restart. BTW, we have enough memory on the box for the broker to run and jconsole show it is using only a portion. There is no error in the log with debug turned on.

          DemandForwardingBridgeSupport's serviceLocalCommand is supposed to be called but not. Any possible threading issue? Regarding the demandforwarding bridge, do you know the name of the thread I shall look in jconsole. due to the complexity of our system, there are about 170 live threads in the jconsole for this broker. Maybe a thread is blocked.

          Any suggestion is welcome. I am looking into this issue until i find a fix.

          Show
          ying added a comment - I see this in Trunk 771718. Our network of 4 brokers are running for almost 30 days, then we see a broker is not dispatching msgs. It has about 400mb data in its data directory. We restart the application to talk to another broker, restart the problematic broker, it starts fine, bridged ok, sees the consumer on the other broker, but all the msgs stuck on this broker is not dispatched. They are stuck. I will try to gather more information when I get more idea. A simple unit test of the case is ok with less msgs. I suspect this happens when a lot of msgs are stuck on the queue, then it fails to dispatch even after restart. BTW, we have enough memory on the box for the broker to run and jconsole show it is using only a portion. There is no error in the log with debug turned on. DemandForwardingBridgeSupport's serviceLocalCommand is supposed to be called but not. Any possible threading issue? Regarding the demandforwarding bridge, do you know the name of the thread I shall look in jconsole. due to the complexity of our system, there are about 170 live threads in the jconsole for this broker. Maybe a thread is blocked. Any suggestion is welcome. I am looking into this issue until i find a fix.
          Hide
          Rob Davies added a comment -

          Looks fixed for 5.3

          Show
          Rob Davies added a comment - Looks fixed for 5.3
          Hide
          Torsten Mielke added a comment -

          I also took the second test case DispatchMultipleConsumersTest.java and wrapped it in a maven JUnit project (see AMQ-2009Testcase2.zip) to run it more quickly. I first tested against version 5.3 and did not reproduce any errors. The JUnit test succeeded many times (even under a higher load than initially implemented). When testing against version 5.1 I did reproduce the duplicate message problem AMQ-2020 but not this bug!

          So that exhausts the two testcases we have so far. I have not been able to reproduce this issue.
          Has anyone else who commented on this bug tried version 5.3 yet? What are the results?

          Show
          Torsten Mielke added a comment - I also took the second test case DispatchMultipleConsumersTest.java and wrapped it in a maven JUnit project (see AMQ-2009 Testcase2.zip) to run it more quickly. I first tested against version 5.3 and did not reproduce any errors. The JUnit test succeeded many times (even under a higher load than initially implemented). When testing against version 5.1 I did reproduce the duplicate message problem AMQ-2020 but not this bug! So that exhausts the two testcases we have so far. I have not been able to reproduce this issue. Has anyone else who commented on this bug tried version 5.3 yet? What are the results?
          Hide
          Erik Drolshammer added a comment -
          Show
          Erik Drolshammer added a comment - Could this be related to https://issues.apache.org/activemq/browse/AMQ-2169?
          Hide
          Torsten Mielke added a comment - - edited

          Don't want to say anything about the bug itself but I took consumertest.zip and turned it into a Maven JUnit test (see attached testcase.zip) in order to investigate into this problem. This new testcase also reproduces the same behavior however I do not believe this testcase shows a bug. Let me explain why.

          Both consumers listen on the same queue.
          The first consumer only closes its session after the second consumer tried to receive the message that the second producer sent to the queue. So the first consumer is still active when the second consumer calls f_consumer.receive(5000);

          So what happens at runtime is that even though both consumers use a pull mode (by calling consumer.receive(5000); ) there is a default prefetch size (as this is not set explicitly) that is used for each consumer.
          The first consumer acked the first message so it is available to receive more messages (even though it does not actively call f_consumer.receive()). So when the second message appears on the queue, the broker sends it right to the first consumer where it stays in the prefetch queue until it either gets received by the consumer calling f_consumer.receive() or the session gets closed. If in the testcase you call

          consumerTest.receiveMessage(dinges);
          

          rather than

          consumerTestNew.receiveMessage(dinges);
          

          the message is received fine.

          So there are two ways to work around this:

          1. have the first consumer close its session before the second consumer receives the message:

          ConsumerProblemTest.java
          ...
          consumerTest.close();
          
          //create second consumer and read msg
          DurableConsumer consumerTestNew = new DurableConsumer();
          consumerTestNew.init();
          consumerTestNew.receiveMessage(dinges);
          consumerTestNew.close();
          

          2. use a prefetch limit of 0 so messages to not get prefetched to consumers:

          DurableConsumer.java
          env.put(InitialContext.PROVIDER_URL, "tcp://localhost:61616?jms.prefetchPolicy.queuePrefetch=0");
          

          With any of the two changes the testcase succeeds. Give it a go. Simply run mvn test, it should fail out of the box. Then make these changes, the testcase will succeed.

          Show
          Torsten Mielke added a comment - - edited Don't want to say anything about the bug itself but I took consumertest.zip and turned it into a Maven JUnit test (see attached testcase.zip) in order to investigate into this problem. This new testcase also reproduces the same behavior however I do not believe this testcase shows a bug. Let me explain why. Both consumers listen on the same queue. The first consumer only closes its session after the second consumer tried to receive the message that the second producer sent to the queue. So the first consumer is still active when the second consumer calls f_consumer.receive(5000); So what happens at runtime is that even though both consumers use a pull mode (by calling consumer.receive(5000); ) there is a default prefetch size (as this is not set explicitly) that is used for each consumer. The first consumer acked the first message so it is available to receive more messages (even though it does not actively call f_consumer.receive()). So when the second message appears on the queue, the broker sends it right to the first consumer where it stays in the prefetch queue until it either gets received by the consumer calling f_consumer.receive() or the session gets closed. If in the testcase you call consumerTest.receiveMessage(dinges); rather than consumerTestNew.receiveMessage(dinges); the message is received fine. So there are two ways to work around this: 1. have the first consumer close its session before the second consumer receives the message: ConsumerProblemTest.java ... consumerTest.close(); //create second consumer and read msg DurableConsumer consumerTestNew = new DurableConsumer(); consumerTestNew.init(); consumerTestNew.receiveMessage(dinges); consumerTestNew.close(); 2. use a prefetch limit of 0 so messages to not get prefetched to consumers: DurableConsumer.java env.put(InitialContext.PROVIDER_URL, "tcp: //localhost:61616?jms.prefetchPolicy.queuePrefetch=0" ); With any of the two changes the testcase succeeds. Give it a go. Simply run mvn test, it should fail out of the box. Then make these changes, the testcase will succeed.
          Hide
          Lukasz Zielinski added a comment -

          We ran into the same problem using topics (with 5.2.0) - we use only TCP transport and persistence is disabled.

          After some time (will make this more specific when I get more data from support) our appication stops receiving and publishing messages on a topic.
          When I check the broker using JMX console ~13000 messages in flight are reported.

          Show
          Lukasz Zielinski added a comment - We ran into the same problem using topics (with 5.2.0) - we use only TCP transport and persistence is disabled. After some time (will make this more specific when I get more data from support) our appication stops receiving and publishing messages on a topic. When I check the broker using JMX console ~13000 messages in flight are reported.
          Hide
          Maurits Lucas added a comment -

          Ho hum, turns out we had producer flow control disabled in our setup. Switching it on solved the issue, now our tests process all 5.9 million messages correctly and ActiveMQ no longer hangs.

          And I think the max value for the inflight counter of 32766 observed in earlier tests is explained by the default prefetch size for topics: 32766.

          Show
          Maurits Lucas added a comment - Ho hum, turns out we had producer flow control disabled in our setup. Switching it on solved the issue, now our tests process all 5.9 million messages correctly and ActiveMQ no longer hangs. And I think the max value for the inflight counter of 32766 observed in earlier tests is explained by the default prefetch size for topics: 32766.
          Hide
          Brecht Yperman added a comment -

          If I introduce a step between 10 and 11: receive message using consumer2, the message sent in step 6 is received.

          http://activemq.apache.org/point-to-point-with-multiple-consumers.html

          This says the behaviour is undefined, so I shouldn't nag, I guess?

          Show
          Brecht Yperman added a comment - If I introduce a step between 10 and 11: receive message using consumer2, the message sent in step 6 is received. http://activemq.apache.org/point-to-point-with-multiple-consumers.html This says the behaviour is undefined, so I shouldn't nag, I guess?
          Hide
          Brecht Yperman added a comment -

          This is some testcode (very ugly, sorry) that keeps a consumer opened, and then tries to retrieve a message using a second consumer, but that consumer never receives the message

          Show
          Brecht Yperman added a comment - This is some testcode (very ugly, sorry) that keeps a consumer opened, and then tries to retrieve a message using a second consumer, but that consumer never receives the message
          Hide
          Brecht Yperman added a comment -

          I have a very similar problem, though much easier to reproduce.

          I might be doing something wrong however (test runs on Websphere MQ though).

          On one thread:
          1 - Create connection1, session1, producer1 for queue queue1
          2 - send message using producer1
          3 - Create connection2, session2, consumer2 for queue queue1
          4 - receive message using consumer2
          5 - Create connection3, session3, producer3 for queue queue1
          6 - send message using producer3
          7 - close producer3, session3, connection3
          8 - Create connection4, session4, consumer4 for queue queue1
          9 - receive message using consumer4
          10 - close consumer4, session4, connection4
          11 - close consumer2, session2, connection2
          12 - close producer1, session1, connection1

          Step 9 gives a timeout, the message sent in step 6 is never received...

          I will attach my testcode...

          Show
          Brecht Yperman added a comment - I have a very similar problem, though much easier to reproduce. I might be doing something wrong however (test runs on Websphere MQ though). On one thread: 1 - Create connection1, session1, producer1 for queue queue1 2 - send message using producer1 3 - Create connection2, session2, consumer2 for queue queue1 4 - receive message using consumer2 5 - Create connection3, session3, producer3 for queue queue1 6 - send message using producer3 7 - close producer3, session3, connection3 8 - Create connection4, session4, consumer4 for queue queue1 9 - receive message using consumer4 10 - close consumer4, session4, connection4 11 - close consumer2, session2, connection2 12 - close producer1, session1, connection1 Step 9 gives a timeout, the message sent in step 6 is never received... I will attach my testcode...
          Hide
          Maurits Lucas added a comment -

          Screenshot of topic t1 with inflight counter graphed

          Show
          Maurits Lucas added a comment - Screenshot of topic t1 with inflight counter graphed
          Hide
          Maurits Lucas added a comment -

          We have been able to reproduce this issue in our test runs every time.

          The only problem, and the reason I haven't got a unit test for you, is that it takes a very long time to rear its ugly head.

          The minimum setup which exposes the issue is have a producer (P1) send a lot (6 million in our case) messages to a topic (t1) and have one consumer (C1) consume the messages from the topic and publish them to a queue (q1).

          P1 --> [t1] --> C1 --> [q1]

          Both P1 and C1 produce and consume / produce messages as fast as they can. Because P1 produces faster than C1 can consume, we get a fast producer / slow consumer for that topic.

          At the end of the run, there are 6 million messages on the queue, but if you add a consumer to the queue, it won't consume any messages. If you browse the queue using the webconsole, it appears empty. Only restarting ActiveMQ gets q1 working again.

          In our previous tests, we discovered that this "freezing" of dispatching (it affects other queues than q1 as well) starts somewhere during the publishing of the 6 million messages to t1.

          We are using a standalone broker, Kaha for persistence and Spring 2.5.4 for JMS support.

          OS: Linux (2.6.18-92.1.18.el5 #1 SMP Wed Nov 12 09:19:49 EST 2008 x86_64 x86_64 x86_64 GNU/Linux)
          Java:
          java version "1.6.0_10"
          Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
          Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)

          We have tested with 5.1.0, 5.2.0 and 5.3-SNAPSHOT. Using 5.1.0 and the snapshot we ran into the issue, with 5.2.0 we ran into AMQ-2020 and so didn't test any further.

          We saw some peculiar values for the inflight counter on t1, this leveled out at 32766 for hours on end, sometimes going down to 32765 for a couple of seconds. Not sure if that is significant, but I am including a screenshot of JConsole just in case.

          Show
          Maurits Lucas added a comment - We have been able to reproduce this issue in our test runs every time. The only problem, and the reason I haven't got a unit test for you, is that it takes a very long time to rear its ugly head. The minimum setup which exposes the issue is have a producer (P1) send a lot (6 million in our case) messages to a topic (t1) and have one consumer (C1) consume the messages from the topic and publish them to a queue (q1). P1 --> [t1] --> C1 --> [q1] Both P1 and C1 produce and consume / produce messages as fast as they can. Because P1 produces faster than C1 can consume, we get a fast producer / slow consumer for that topic. At the end of the run, there are 6 million messages on the queue, but if you add a consumer to the queue, it won't consume any messages. If you browse the queue using the webconsole, it appears empty. Only restarting ActiveMQ gets q1 working again. In our previous tests, we discovered that this "freezing" of dispatching (it affects other queues than q1 as well) starts somewhere during the publishing of the 6 million messages to t1. We are using a standalone broker, Kaha for persistence and Spring 2.5.4 for JMS support. OS: Linux (2.6.18-92.1.18.el5 #1 SMP Wed Nov 12 09:19:49 EST 2008 x86_64 x86_64 x86_64 GNU/Linux) Java: java version "1.6.0_10" Java(TM) SE Runtime Environment (build 1.6.0_10-b33) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode) We have tested with 5.1.0, 5.2.0 and 5.3-SNAPSHOT. Using 5.1.0 and the snapshot we ran into the issue, with 5.2.0 we ran into AMQ-2020 and so didn't test any further. We saw some peculiar values for the inflight counter on t1, this leveled out at 32766 for hours on end, sometimes going down to 32765 for a couple of seconds. Not sure if that is significant, but I am including a screenshot of JConsole just in case.
          Hide
          Maarten Dirkse added a comment -

          For what it's worth: the 5.3-SNAPSHOT does seem to fix the duplicate message issue (no more exceptions are reported), but it does not fix this bug. Our test runs hang our broker at more or less the same point (1.4 million messages) for 5.1, 5.2 and 5.3-SNAPSHOT.

          Show
          Maarten Dirkse added a comment - For what it's worth: the 5.3-SNAPSHOT does seem to fix the duplicate message issue (no more exceptions are reported), but it does not fix this bug. Our test runs hang our broker at more or less the same point (1.4 million messages) for 5.1, 5.2 and 5.3-SNAPSHOT.
          Hide
          Gary Tully added a comment -

          on 5.2 giving duplicate messages. The recent fix for AMQ-2020 may be relevant. If you get a chance, can you try out last nights 5.3-SNAPSHOT build. http://people.apache.org/repo/m2-snapshot-repository/org/apache/activemq/apache-activemq/5.3-SNAPSHOT/

          Show
          Gary Tully added a comment - on 5.2 giving duplicate messages. The recent fix for AMQ-2020 may be relevant. If you get a chance, can you try out last nights 5.3-SNAPSHOT build. http://people.apache.org/repo/m2-snapshot-repository/org/apache/activemq/apache-activemq/5.3-SNAPSHOT/
          Hide
          Rajani Chennamaneni added a comment -

          Re-attaching the test case with ASF grant.

          Show
          Rajani Chennamaneni added a comment - Re-attaching the test case with ASF grant.
          Hide
          Gary Tully added a comment -

          Could you re submit your test case (attach file, again) and tick the "ASF Granted License" box so that your test can be included in the code base?

          Show
          Gary Tully added a comment - Could you re submit your test case (attach file, again) and tick the "ASF Granted License" box so that your test can be included in the code base?

            People

            • Assignee:
              Rob Davies
              Reporter:
              Rajani Chennamaneni
            • Votes:
              23 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

              • Created:
                Updated:

                Development