Issue Details (XML | Word | Printable)

Key: AMQ-443
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Kevin Yaussy
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
ActiveMQ

ReliableTransport / KeepAlive algorithm does not work properly.

Created: 15/Dec/05 01:02 PM   Updated: 15/Jun/06 12:42 PM
Return to search
Component/s: Broker, Transport
Affects Version/s: 3.2, 3.2.1
Fix Version/s: 4.0

Time Tracking:
Not Specified

File Attachments:
  Size
Java Source File KeepAliveDaemon.java 2005-12-15 01:02 PM Kevin Yaussy 9 kB
Java Source File ReliableTransportChannel.java 2005-12-15 01:02 PM Kevin Yaussy 9 kB
Environment: Solaris 8 / 10. JDK 1.5


 Description  « Hide
The current implementation of KeepAliveDaemon.java will sometimes force disconnections on well behaved connections. The problem may arrise if there is a connection which goes away, and the KeepAlive send to that channel blocks while attempting to reconnect. If this reconnection takes a while, then other channels that were responding fine may get their connections broken. This happens due to the following code in KeepAliveDaemon.java:

if ((channel.getLastReceiptTimestamp() + channel.getKeepAliveTimeout() * 2) < System.currentTimeMillis()) { or } else if ((channel.getLastReceiptTimestamp() + channel.getKeepAliveTimeout()) < System.currentTimeMillis()) {

The fact that the receipt timestamp is checked against System.currentTimeMillis() causes the code to break otherwise good connections. If a KeepAlive send (in examineChannel) for a broken channel takes longer than some good channel's KeepAliveTimeout, then the good connection gets broken.

This can, in turn, cause some pretty bad behavior in the Broker. While testing and diagnosing this problem, I could some brokers in a network of brokers stuck. The sequence of events during recovery, which get interrupted due to closing the connections, would sometimes lead to the broker hanging waiting for a receipt, such as during an addConsumer (which eventually calls syncSendWithReceipt).

I have redone the logic in KeepAliveDaemon.java (which required a small change to ReliableTransportChannel as well). This now seems to work.

I'm a bit concerned about the blocking calls, though. This may be a different issue / bug. I thought it looked like there was a mechanism to cancel outstanding receipt waiters - but, every once in a while that mechanism would not get called. This results in the broker basically getting stuck, and does not ever really recover.



 All   Comments   Work Log   Change History   Subversion Commits   FishEye   Crucible      Sort Order: Ascending order - Click to sort in descending order
Hiram Chirino added a comment - 15/Jun/06 03:47 AM
4.0 Has implemented a more robust keepalive solution. KeepAlive packets are only sent when the transport has been idle. Also, while the transport is performing a blocking opperation it is not considered idle.

Kevin Yaussy added a comment - 15/Jun/06 12:42 PM
Yes - and so far the 4.0 approach is working very well in this respect.