Issue Details (XML | Word | Printable)

Key: AMQ-2009
Type: Bug Bug
Status: Reopened Reopened
Priority: Major Major
Assignee: Rob Davies
Reporter: Rajani Chennamaneni
Votes: 15
Watchers: 21
Operations

If you were logged in you would be able to see more operations.
ActiveMQ

Problem with message dispatch after a while

Created: 20/Nov/08 03:58 PM   Updated: 04/Sep/09 12:04 PM
Return to search
Component/s: Broker
Affects Version/s: 5.1.0, 5.2.0
Fix Version/s: 5.4.0

Time Tracking:
Not Specified

File Attachments:
  Size
Zip Archive Licensed for inclusion in ASF works AMQ-2009Testcase2.zip 2009-03-20 07:27 AM Torsten Mielke 7 kB
Zip Archive consumertest.zip 2008-12-15 09:14 AM Brecht Yperman 3 kB
Java Source File Licensed for inclusion in ASF works DispatchMultipleConsumersTest.java 2008-11-21 10:55 AM Rajani Chennamaneni 6 kB
Zip Archive Licensed for inclusion in ASF works testcase.zip 2009-03-18 10:48 AM Torsten Mielke 9 kB
Image Attachments:

1. JConsole-screenshot.jpg
(105 kB)


 Description  « Hide
Messages are not getting dispatched after a while (although it accepts new incoming messages) until restart of the broker. This problem is described in several posts.

http://www.nabble.com/Pending-Messages-are-shown-in-ActiveMQ-td20241332.html

http://www.nabble.com/Consumer-Listener-stop-receving-message-until-ActiveMQ-restart-td20355247.html

http://www.nabble.com/Stuck-messages---Dispatch-issues-td20467949.html

There was also an issue opened in Spring project for this thinking it was Spring problem.

http://jira.springframework.org/browse/SPR-5110

I am not able to reproduce with Junit test case having BrokerService started with in the test case. I guess I am not hitting the right stress conditions this way. But when I run the test case against an externally running ActiveMQ instance backed with oracle database persistence, it is reproducible most of the times. This is not a every time failure situation, it takes more time once than the other.

I was able to hit this situation of stuck messages on queue using following scenario most of the times:

1) Start 2 concurrent consumers for the queue using Spring's DefaultMessageListenerContainer using cacheLevelName as CACHE_CONSUMER
2) Send messages using JMETER 2.3.2 to the queue on ActiveMQ stand alone broker instance with 50 threads looping 20 times.
3) After a while, you will notice that Spring logs that no messages are being received but the messages are shown jconsole of ActiveMQ and the database backing it for persistence.

But in 5.2 RC3, the problem is that it dispatches duplicate messages and does not remove them from broker's database after acknowledge properly.

Attached test case might help to reproduce when run against externally running stand alone ActiveMQ broker. Another way to see the problem is that try to load test using JMETER by sending messages to a queue with a camel route that moves messages from this queue to another and you will notice that it stops moving after while or copied duplicates in case of 5.2 RC3.

Sorry about such a huge description but it is a real problem! A different team at our company are having this issue in production with 5.1. They are using it as an embedded broker with derby for persistence.



 All   Comments   Work Log   Change History   Subversion Commits   FishEye   Crucible      Sort Order: Ascending order - Click to sort in descending order
Gary Tully added a comment - 21/Nov/08 08:37 AM
Could you re submit your test case (attach file, again) and tick the "ASF Granted License" box so that your test can be included in the code base?

Rajani Chennamaneni added a comment - 21/Nov/08 10:55 AM
Re-attaching the test case with ASF grant.

Gary Tully added a comment - 04/Dec/08 03:44 AM
on 5.2 giving duplicate messages. The recent fix for AMQ-2020 may be relevant. If you get a chance, can you try out last nights 5.3-SNAPSHOT build. http://people.apache.org/repo/m2-snapshot-repository/org/apache/activemq/apache-activemq/5.3-SNAPSHOT/

Maarten added a comment - 09/Dec/08 03:36 AM
For what it's worth: the 5.3-SNAPSHOT does seem to fix the duplicate message issue (no more exceptions are reported), but it does not fix this bug. Our test runs hang our broker at more or less the same point (1.4 million messages) for 5.1, 5.2 and 5.3-SNAPSHOT.

Maurits Lucas added a comment - 15/Dec/08 02:29 AM
We have been able to reproduce this issue in our test runs every time.

The only problem, and the reason I haven't got a unit test for you, is that it takes a very long time to rear its ugly head.

The minimum setup which exposes the issue is have a producer (P1) send a lot (6 million in our case) messages to a topic (t1) and have one consumer (C1) consume the messages from the topic and publish them to a queue (q1).

P1 --> [t1] --> C1 --> [q1]

Both P1 and C1 produce and consume / produce messages as fast as they can. Because P1 produces faster than C1 can consume, we get a fast producer / slow consumer for that topic.

At the end of the run, there are 6 million messages on the queue, but if you add a consumer to the queue, it won't consume any messages. If you browse the queue using the webconsole, it appears empty. Only restarting ActiveMQ gets q1 working again.

In our previous tests, we discovered that this "freezing" of dispatching (it affects other queues than q1 as well) starts somewhere during the publishing of the 6 million messages to t1.

We are using a standalone broker, Kaha for persistence and Spring 2.5.4 for JMS support.

OS: Linux (2.6.18-92.1.18.el5 #1 SMP Wed Nov 12 09:19:49 EST 2008 x86_64 x86_64 x86_64 GNU/Linux)
Java:
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)

We have tested with 5.1.0, 5.2.0 and 5.3-SNAPSHOT. Using 5.1.0 and the snapshot we ran into the issue, with 5.2.0 we ran into AMQ-2020 and so didn't test any further.

We saw some peculiar values for the inflight counter on t1, this leveled out at 32766 for hours on end, sometimes going down to 32765 for a couple of seconds. Not sure if that is significant, but I am including a screenshot of JConsole just in case.


Maurits Lucas added a comment - 15/Dec/08 02:30 AM
Screenshot of topic t1 with inflight counter graphed

Brecht Yperman added a comment - 15/Dec/08 09:11 AM
I have a very similar problem, though much easier to reproduce.

I might be doing something wrong however (test runs on Websphere MQ though).

On one thread:
1 - Create connection1, session1, producer1 for queue queue1
2 - send message using producer1
3 - Create connection2, session2, consumer2 for queue queue1
4 - receive message using consumer2
5 - Create connection3, session3, producer3 for queue queue1
6 - send message using producer3
7 - close producer3, session3, connection3
8 - Create connection4, session4, consumer4 for queue queue1
9 - receive message using consumer4
10 - close consumer4, session4, connection4
11 - close consumer2, session2, connection2
12 - close producer1, session1, connection1

Step 9 gives a timeout, the message sent in step 6 is never received...

I will attach my testcode...


Brecht Yperman added a comment - 15/Dec/08 09:14 AM
This is some testcode (very ugly, sorry) that keeps a consumer opened, and then tries to retrieve a message using a second consumer, but that consumer never receives the message

Brecht Yperman added a comment - 15/Dec/08 09:18 AM
If I introduce a step between 10 and 11: receive message using consumer2, the message sent in step 6 is received.

http://activemq.apache.org/point-to-point-with-multiple-consumers.html

This says the behaviour is undefined, so I shouldn't nag, I guess?


Maurits Lucas added a comment - 30/Dec/08 02:45 AM
Ho hum, turns out we had producer flow control disabled in our setup. Switching it on solved the issue, now our tests process all 5.9 million messages correctly and ActiveMQ no longer hangs.

And I think the max value for the inflight counter of 32766 observed in earlier tests is explained by the default prefetch size for topics: 32766.


Lukasz Zielinski added a comment - 11/Mar/09 02:02 AM
We ran into the same problem using topics (with 5.2.0) - we use only TCP transport and persistence is disabled.

After some time (will make this more specific when I get more data from support) our appication stops receiving and publishing messages on a topic.
When I check the broker using JMX console ~13000 messages in flight are reported.


Torsten Mielke added a comment - 18/Mar/09 11:07 AM - edited
Don't want to say anything about the bug itself but I took consumertest.zip and turned it into a Maven JUnit test (see attached testcase.zip) in order to investigate into this problem. This new testcase also reproduces the same behavior however I do not believe this testcase shows a bug. Let me explain why.

Both consumers listen on the same queue.
The first consumer only closes its session after the second consumer tried to receive the message that the second producer sent to the queue. So the first consumer is still active when the second consumer calls f_consumer.receive(5000);

So what happens at runtime is that even though both consumers use a pull mode (by calling consumer.receive(5000); ) there is a default prefetch size (as this is not set explicitly) that is used for each consumer.
The first consumer acked the first message so it is available to receive more messages (even though it does not actively call f_consumer.receive()). So when the second message appears on the queue, the broker sends it right to the first consumer where it stays in the prefetch queue until it either gets received by the consumer calling f_consumer.receive() or the session gets closed. If in the testcase you call

consumerTest.receiveMessage(dinges);

rather than

consumerTestNew.receiveMessage(dinges);

the message is received fine.

So there are two ways to work around this:

1. have the first consumer close its session before the second consumer receives the message:

ConsumerProblemTest.java
...
consumerTest.close();

//create second consumer and read msg
DurableConsumer consumerTestNew = new DurableConsumer();
consumerTestNew.init();
consumerTestNew.receiveMessage(dinges);
consumerTestNew.close();

2. use a prefetch limit of 0 so messages to not get prefetched to consumers:

DurableConsumer.java
env.put(InitialContext.PROVIDER_URL, "tcp://localhost:61616?jms.prefetchPolicy.queuePrefetch=0");

With any of the two changes the testcase succeeds. Give it a go. Simply run mvn test, it should fail out of the box. Then make these changes, the testcase will succeed.


Erik Drolshammer added a comment - 19/Mar/09 08:20 AM

Torsten Mielke added a comment - 20/Mar/09 07:27 AM
I also took the second test case DispatchMultipleConsumersTest.java and wrapped it in a maven JUnit project (see AMQ-2009Testcase2.zip) to run it more quickly. I first tested against version 5.3 and did not reproduce any errors. The JUnit test succeeded many times (even under a higher load than initially implemented). When testing against version 5.1 I did reproduce the duplicate message problem AMQ-2020 but not this bug!

So that exhausts the two testcases we have so far. I have not been able to reproduce this issue.
Has anyone else who commented on this bug tried version 5.3 yet? What are the results?


Rob Davies added a comment - 23/May/09 02:01 AM
Looks fixed for 5.3

ying added a comment - 10/Jun/09 02:45 PM
I see this in Trunk 771718. Our network of 4 brokers are running for almost 30 days, then we see a broker is not dispatching msgs. It has about 400mb data in its data directory.

We restart the application to talk to another broker, restart the problematic broker, it starts fine, bridged ok, sees the consumer on the other broker, but all the msgs stuck on this broker is not dispatched. They are stuck.

I will try to gather more information when I get more idea. A simple unit test of the case is ok with less msgs. I suspect this happens when a lot of msgs are stuck on the queue, then it fails to dispatch even after restart. BTW, we have enough memory on the box for the broker to run and jconsole show it is using only a portion. There is no error in the log with debug turned on.

DemandForwardingBridgeSupport's serviceLocalCommand is supposed to be called but not. Any possible threading issue? Regarding the demandforwarding bridge, do you know the name of the thread I shall look in jconsole. due to the complexity of our system, there are about 170 live threads in the jconsole for this broker. Maybe a thread is blocked.

Any suggestion is welcome. I am looking into this issue until i find a fix.


ying added a comment - 11/Jun/09 03:26 PM
I have a simple cause which can cause dispatch problem:

1. setup a network of broker1, broker2, bridged by multicast discovery
2. make a producer send 5 msg to queueA to broker2
3. make a consumer to consume from broker1 queueA ( make it slow, so it only consumer 1 msg) but make sure all 5 msg from broker2 are forwared to broker1
4. stop the consumer to broke1, restart it to consume from broker2 queueA
5. the 4 msgs originally published to broker2 and forwarded to broker1 and has not yet been consumed will stuck on broker1 and will not forwarded to broker2 for the consumer to consume.

You can only clear out the 4 left over stuck msg by making consumer consume from broker1 in this case.

This is a very critical issue because you might restart your application many times and run into this.

I will look into how the demandfowardbridge handles this case. if you know anything about this, please help. this is very urgent. thanks


Gary Tully added a comment - 24/Jun/09 04:21 AM
pushing this out to 5.4.0 as more information and analysis is needed. The most recent comments may point to a different issue altogether.