History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: AMQ-1837
Type: Bug Bug
Status: Open Open
Priority: Critical Critical
Assignee: Unassigned
Reporter: Jan Wiemer
Votes: 2
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
ActiveMQ

Lost messages after broker recovery

Created: 03/Jul/08 04:21 AM   Updated: 14/Jul/08 02:32 AM
Component/s: Message Store
Affects Version/s: 5.1.0
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Zip Archive Licensed for inclusion in ASF works MultithreadedQueueTest2.zip 2008-07-03 08:38 AM Helmut Janknecht 4 kb
Zip Archive src.zip 2008-07-03 04:21 AM Jan Wiemer 2 kb


 Description  « Hide
We experienced that some messages (send in persistent mode) were lost after the message broker was killed and restarted.
The attached little test simulates the situation:
There are 10 queues, 10 sender and 10 receiver threads each sending / receiving quite frequently to / from one queue for a quite long time.
The receiver for each queue expects the messages exactly in the order the sender has send the messages.
After restarting (killing) the broker the order of the received messages gets disturbed or messages get lost.
  • Start the broker as configures by default in the distribution (only switch on JMX support).
  • Start the main method of the attached class
  • Wait until some messages are received
  • kill or shutdown the broker
  • restart the broker
  • have a look at the output of the test class
    Output
    Exception in ReceiverThread: java.lang.RuntimeException: Received unexpected message! Expected: QUEUE-4: #3197, Received: QUEUE-4: #3267
Broker Output
...
INFO  BrokerService                  - Using Persistence Adapter: AMQPersistenceAdapter(D:\sandbox\Experimental\Activemq\apache-activemq-5.2-SNAPSHOT\bin\..\data)
INFO  AMQPersistenceAdapter          - AMQStore starting using directory: D:\sandbox\Experimental\Activemq\apache-activemq-5.2-SNAPSHOT\bin\..\data
INFO  KahaStore                      - Kaha Store using data directory D:\sandbox\Experimental\Activemq\apache-activemq-5.2-SNAPSHOT\bin\..\data\kr-store\state
INFO  AMQPersistenceAdapter          - Active data files: []
WARN  AMQPersistenceAdapter          - The ReferenceStore is not valid - recovering ...
INFO  KahaStore                      - Kaha Store successfully deleted data directory D:\sandbox\Experimental\Activemq\apache-activemq-5.2-SNAPSHOT\bin\..\data\kr-store\data
INFO  AMQPersistenceAdapter          - Journal Recovery Started from: DataManager:(data-)
INFO  KahaStore                      - Kaha Store using data directory D:\sandbox\Experimental\Activemq\apache-activemq-5.2-SNAPSHOT\bin\..\data\kr-store\data
INFO  AMQPersistenceAdapter          - Recovered 198261 operations from redo log in 18.028 seconds.
INFO  AMQPersistenceAdapter          - Finished recovering the ReferenceStore
INFO  BrokerService                  - ActiveMQ 5.2-SNAPSHOT JMS Message Broker (localhost) is starting
...


 All   Comments   Work Log   Change History   Subversion Commits   FishEye   Crucible      Sort Order: Ascending order - Click to sort in descending order
Jan Wiemer - 03/Jul/08 04:31 AM
By the way I used: apache-activemq-5.2-20080619.135742-2-bin.zip

Jan Wiemer - 03/Jul/08 05:21 AM
Note that the problem seems not to occur when working with a single queue (and one sender thread and one receiver thread).

Helmut Janknecht - 03/Jul/08 08:38 AM
I changed code a little bit and do the receive with a 120s timeout in order to avoid race conditions there.
Addtionally I now sent object messages with approx. 2K size. With prefetch size set to default it fails often after Kaha recovery. With low prefetch size 5 or 10 failure rate is lower but not 0...

Helmut Janknecht - 03/Jul/08 11:21 PM - edited
One example log of a failure, here actually without a restart:
QUEUE-2 currently sent message #10000
QUEUE-8 currently received message #9000
QUEUE-4 currently sent message #10000
QUEUE-5 currently sent message #10000
QUEUE-9 currently sent message #10000
QUEUE-0 currently received message #10000
QUEUE-3 currently sent message #10000
QUEUE-4 currently received message #10000
QUEUE-2 currently received message #10000
Unexpected message from QUEUE-8! Expected: QUEUE-8: #9342, Received: QUEUE-8: #9343

It skipped one message #9342.
Browsing QUEUE-8 with JMX showed:

----------------------------------------------------------------------
ID:dono-janknecht-4560-1215150766850-0:6:1:1:9344: QUEUE-8: #9344
ID:dono-janknecht-4560-1215150766850-0:6:1:1:9345: QUEUE-8: #9345
ID:dono-janknecht-4560-1215150766850-0:6:1:1:9346: QUEUE-8: #9346
ID:dono-janknecht-4560-1215150766850-0:6:1:1:9347: QUEUE-8: #9347
ID:dono-janknecht-4560-1215150766850-0:6:1:1:9348: QUEUE-8: #9348
ID:dono-janknecht-4560-1215150766850-0:6:1:1:9350: QUEUE-8: #9350
...
ID:dono-janknecht-4560-1215150766850-0:6:1:1:9474: QUEUE-8: #9474
ID:dono-janknecht-4560-1215150766850-0:6:1:1:9475: QUEUE-8: #9475
----------------------------------------------------------------------

So QUEUE-8 does not contain this element.

Then I restarted the broker and now surprisingly the missing "QUEUE-8: #9342" appears, the "JMSRedelivered" is set to false.

So it seems that under heavy load in multi threaded scenarios persistent messages somtimes stuck.
With DeliveryMode.NON_PERSISTENT it seems to work.


Helmut Janknecht - 14/Jul/08 02:32 AM
Any news here about this issue?
I retried with latest apache-activemq-5.2-20080713.231541-8 and same problem: after a few seconds one of receiver thread skips a message:
...
Q2 currently received: Q2: #2000
Q0 currently received: Q0: #2000
Unexpected message from Q4! Expected: Q4: #2155, Received: Q4: #2156 (JMSRedelivered=false)
Unexpected message from Q3! Expected: Q3: #2177, Received: Q3: #2178 (JMSRedelivered=false)
----------------------------------------------------------------------
ID:dono-janknecht-3517-1216026084546-0:0:1:1:2158: Q4: #2158
ID:dono-janknecht-3517-1216026084546-0:0:1:1:2160: Q4: #2160
ID:dono-janknecht-3517-1216026084546-0:0:1:1:2161: Q4: #2161
ID:dono-janknecht-3517-1216026084546-0:0:1:1:2164: Q4: #2164
ID:dono-janknecht-3517-1216026084546-0:0:1:1:2166: Q4: #2166
...

And this again without any restart or so but with PERSISTENT messages.
Here it skipped #2155 in Q4 and JMX browsing shows that also #2159 is missing.
If you then stop broker normally and restart these two messages are there again!?

I tried same scenario for example with Sun's OpenMQ 4.2 and there it works.
For us this is a real show stopper because we rely on strict queue message ordering...