Sometime, a (or some) message(s) hang in the queue while no consumer eat it. It happen more often after failover.
Scenario:
2 brokers (jdbc master/slave), 2 consumers (with prefetch set to 1), 2 producers
Producers :
ant producer -Durl="failover:(tcp://localhost:61618,tcp://localhost:61619)" -Ddurable=true -Dmax=500
Consumer 1:
ant consumer -Durl="failover:(tcp://localhost:61618,tcp://localhost:61619)" -Dmax=10000 -DclientId=c1
Consumer 2:
ant consumer -Durl="failover:(tcp://localhost:61618,tcp://localhost:61619)" -Dmax=10000 -DclientId=c2
1 - Start the two brokers (one will be master, the other will be slave)
2 - Start the producers, consumers
3 - Wait a little,
4 - Kill the master -> slave become master
5 - Producers continue producing, consumers continue consuming
6 - After all producers finish their task, the consumer will finish consuming, and sometimes there still messages left in the queue (in the database, and using JMX to see the state of the queue).
7 - Restart a new broker, kill the master
8 - The messages will be consumed
There is a race condition between the time the message is set with the broker sequence number (RegionBroker.java in send method), and the time it is actually put in the database (DefaultJDBCAdapter.java in doAddMessage method).
I have seen that sometimes message with higher sequence number are put in database before a lower sequence number. For example: 386 is put in the database before 385. If it is happening when JDBCMessageStore is recovering the next message (lastMessageId is 384), then 386 will be fetched and the lastMessageId will change to be 386. 385 is then put in the db but never retrieved (stopping and restarting the broker will allow to retrieve the message because at start the lastMessageId is -1).
I have synchronized the code inside the RegionBroker.send, and I don't have gaps anymore. This is a workaround for us since we don't process a lot of message. But maybe a more elegant solution is to set the brokerSequenceId in doAddMessage of JDBCAdapter (I may be wrong, I didn't check if the brokerSequenceId is used elsewhere).
AMQ-1656, sorry.