Affects Version/s: 5.2.0
Fix Version/s: 5.x
Ubuntu Linux 2.6.24-22
Processor with 2 cores
Bitronix Transaction Manager 1.3.2
I tried to read about 200000 messages from a queue. Reading was performed in XA transactions in chunks of 1000 messages per transaction.
After reading some messages (I almost never came above 100000 read messages), an exception occured in the activemq log file (data/activemq.log).
In my application, null was returned for the message. Reading of further messages always returned null until a new transaction was opened.
No exception was thrown and the transaction was not marked as rolled back.
After that, the number of read messages in my application and the number of dequeued messages I saw in the activemq JMX interface was often no longer in sync. It seems to me that activemq was of the opinion that the transaction was rolled back while my application was not.
Another indication is that my application always gets one of the read messages a second time some time after the incident.
When I set the prefetchPolicy.all to 1, the problem first seemed to disappear. Normally, there were no more exceptions in the log file and a null message was never returned. After testing a little more, the problem also occured with prefetch size 1 (see attached log file). However, it seems to occur less often and only with more messages in the queue (I tested with 280000). Duplicate messages did not occur when I tested it once, but the JMX interface to activemq reported 280200 messages dequeued even though there were only 280000 in the queue. It also reported a size of -200 when the queue was empty.
I also tried with prefetchPolicy.all=0, but that created a different problem: there are no exceptions either, but after some time, the application hang completely in the reader thread and never returned. I tried this only once, but probably this should also be investigated, if there is no easy explanation.
I wrote a test case that demonstrates the bug (see attachment). It always occured for 200000 messages, and almost always for 100000 messages. For 20000 messages, for example, the problem almost never occurs. For executing the test, I first stopped the activemq server, deleted the data directory and then restarted the server. After that, I executed the test, which first writes all messages into a queue and then tries to read them back.
The stacktrace from the activemq log file is attached.
The configuration of the broker in my activemq.xml configuration file:
The configuration of the transaction manager (from the Spring application context):
The queue definition from the applicationContext: