|
[
Permlink
| « Hide
]
Gary Tully added a comment - 21/Nov/08 08:37 AM
Could you re submit your test case (attach file, again) and tick the "ASF Granted License" box so that your test can be included in the code base?
Re-attaching the test case with ASF grant.
on 5.2 giving duplicate messages. The recent fix for
For what it's worth: the 5.3-SNAPSHOT does seem to fix the duplicate message issue (no more exceptions are reported), but it does not fix this bug. Our test runs hang our broker at more or less the same point (1.4 million messages) for 5.1, 5.2 and 5.3-SNAPSHOT.
We have been able to reproduce this issue in our test runs every time.
The only problem, and the reason I haven't got a unit test for you, is that it takes a very long time to rear its ugly head. The minimum setup which exposes the issue is have a producer (P1) send a lot (6 million in our case) messages to a topic (t1) and have one consumer (C1) consume the messages from the topic and publish them to a queue (q1). P1 --> [t1] --> C1 --> [q1] Both P1 and C1 produce and consume / produce messages as fast as they can. Because P1 produces faster than C1 can consume, we get a fast producer / slow consumer for that topic. At the end of the run, there are 6 million messages on the queue, but if you add a consumer to the queue, it won't consume any messages. If you browse the queue using the webconsole, it appears empty. Only restarting ActiveMQ gets q1 working again. In our previous tests, we discovered that this "freezing" of dispatching (it affects other queues than q1 as well) starts somewhere during the publishing of the 6 million messages to t1. We are using a standalone broker, Kaha for persistence and Spring 2.5.4 for JMS support. OS: Linux (2.6.18-92.1.18.el5 #1 SMP Wed Nov 12 09:19:49 EST 2008 x86_64 x86_64 x86_64 GNU/Linux) We have tested with 5.1.0, 5.2.0 and 5.3-SNAPSHOT. Using 5.1.0 and the snapshot we ran into the issue, with 5.2.0 we ran into We saw some peculiar values for the inflight counter on t1, this leveled out at 32766 for hours on end, sometimes going down to 32765 for a couple of seconds. Not sure if that is significant, but I am including a screenshot of JConsole just in case. Screenshot of topic t1 with inflight counter graphed
I have a very similar problem, though much easier to reproduce.
I might be doing something wrong however (test runs on Websphere MQ though). On one thread: Step 9 gives a timeout, the message sent in step 6 is never received... I will attach my testcode... This is some testcode (very ugly, sorry) that keeps a consumer opened, and then tries to retrieve a message using a second consumer, but that consumer never receives the message
If I introduce a step between 10 and 11: receive message using consumer2, the message sent in step 6 is received.
http://activemq.apache.org/point-to-point-with-multiple-consumers.html This says the behaviour is undefined, so I shouldn't nag, I guess? Ho hum, turns out we had producer flow control disabled in our setup. Switching it on solved the issue, now our tests process all 5.9 million messages correctly and ActiveMQ no longer hangs.
And I think the max value for the inflight counter of 32766 observed in earlier tests is explained by the default prefetch size for topics: 32766. We ran into the same problem using topics (with 5.2.0) - we use only TCP transport and persistence is disabled.
After some time (will make this more specific when I get more data from support) our appication stops receiving and publishing messages on a topic. Don't want to say anything about the bug itself but I took consumertest.zip and turned it into a Maven JUnit test (see attached testcase.zip) in order to investigate into this problem. This new testcase also reproduces the same behavior however I do not believe this testcase shows a bug. Let me explain why.
Both consumers listen on the same queue. So what happens at runtime is that even though both consumers use a pull mode (by calling consumer.receive(5000); ) there is a default prefetch size (as this is not set explicitly) that is used for each consumer. consumerTest.receiveMessage(dinges); rather than consumerTestNew.receiveMessage(dinges); the message is received fine. So there are two ways to work around this: 1. have the first consumer close its session before the second consumer receives the message: ConsumerProblemTest.java ... consumerTest.close(); //create second consumer and read msg DurableConsumer consumerTestNew = new DurableConsumer(); consumerTestNew.init(); consumerTestNew.receiveMessage(dinges); consumerTestNew.close(); 2. use a prefetch limit of 0 so messages to not get prefetched to consumers: DurableConsumer.java env.put(InitialContext.PROVIDER_URL, "tcp://localhost:61616?jms.prefetchPolicy.queuePrefetch=0");
With any of the two changes the testcase succeeds. Give it a go. Simply run mvn test, it should fail out of the box. Then make these changes, the testcase will succeed. Could this be related to https://issues.apache.org/activemq/browse/AMQ-2169?
I also took the second test case DispatchMultipleConsumersTest.java and wrapped it in a maven JUnit project (see AMQ-2009Testcase2.zip) to run it more quickly. I first tested against version 5.3 and did not reproduce any errors. The JUnit test succeeded many times (even under a higher load than initially implemented). When testing against version 5.1 I did reproduce the duplicate message problem
So that exhausts the two testcases we have so far. I have not been able to reproduce this issue. I see this in Trunk 771718. Our network of 4 brokers are running for almost 30 days, then we see a broker is not dispatching msgs. It has about 400mb data in its data directory.
We restart the application to talk to another broker, restart the problematic broker, it starts fine, bridged ok, sees the consumer on the other broker, but all the msgs stuck on this broker is not dispatched. They are stuck. I will try to gather more information when I get more idea. A simple unit test of the case is ok with less msgs. I suspect this happens when a lot of msgs are stuck on the queue, then it fails to dispatch even after restart. BTW, we have enough memory on the box for the broker to run and jconsole show it is using only a portion. There is no error in the log with debug turned on. DemandForwardingBridgeSupport's serviceLocalCommand is supposed to be called but not. Any possible threading issue? Regarding the demandforwarding bridge, do you know the name of the thread I shall look in jconsole. due to the complexity of our system, there are about 170 live threads in the jconsole for this broker. Maybe a thread is blocked. Any suggestion is welcome. I am looking into this issue until i find a fix. I have a simple cause which can cause dispatch problem:
1. setup a network of broker1, broker2, bridged by multicast discovery You can only clear out the 4 left over stuck msg by making consumer consume from broker1 in this case. This is a very critical issue because you might restart your application many times and run into this. I will look into how the demandfowardbridge handles this case. if you know anything about this, please help. this is very urgent. thanks pushing this out to 5.4.0 as more information and analysis is needed. The most recent comments may point to a different issue altogether.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||