Issue Details (XML | Word | Printable)

Key: AMQ-1146
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Hiram Chirino
Reporter: Mari
Votes: 2
Watchers: 6
Operations

If you were logged in you would be able to see more operations.
ActiveMQ

InactivityMonitor: Transport connection disconnected "Channel was inactive for too long."

Created: 01/Feb/07 12:26 PM   Updated: 19/Oct/07 10:31 AM
Component/s: Broker
Affects Version/s: 4.1.0
Fix Version/s: 5.0.0

Time Tracking:
Issue & Sub-Tasks
Issue Only
Original Estimate: 1 hour
Original Estimate - 1 hour
Remaining Estimate: 1 hour
Remaining Estimate - 1 hour
Time Spent: Not Specified
Remaining Estimate - 1 hour

File Attachments:
  Size
Java Source File InactivityMonitor_changed.java 2007-02-01 12:26 PM Mari 6 kB
Java Source File InactivityMonitor_Original.java 2007-02-01 12:26 PM Mari 6 kB
Environment: Windows Server 2003 running broker and Windows XP running client. Java 1.5.0_08

Patch Info: Patch Available

Sub-Tasks  All   Open   

 Description  « Hide
Sometimes (when the transport is idle for long) the transport gets disconnected with the message "Channel was inactive for too long."

Apparently this is same as reported in http://www.nabble.com/Channel-was-inactive-for-too-long-t1463069.html#a3971744 and http://www.nabble.com/Durable-consumer-reconnect-problem-tf1716817.html#a6147014

Version: 4.1

Background:

The class org.apache.activemq.transport.InactivityMonitor runs monitoring threads to check the read and write of the transport(TCP for example). If it hasn't sent the message during the given period, it sends a KeepAliveInfo to ensure the connectivity. If it doesn't receive the message for a given duration, then it disconnects the transport.

By default, the maximumInactivityDuration is 30000 milliseconds. This is the time interval between the check for read. That is if an end of the transport doesn't receive any message for this period, then it will close the connection. For the read check, the time interval is half of maximumInactivityDuration i.e. 15000 millisecond. That is if a message was not sent during this period, it will send a KeepAliveInfo to ensure the connectivity.

Problem and the recommended fix:

The InactivityMonitor class uses the flags inReceive and commandReceived variables in methods readCheck and onCommand. Typically these two methods are called from different threads. Also, the flags inSend and commandSent are used in methods oneway and writeCheck. Again these two methods are called from different threads. But, the synchronization was missing for these. This seems to be a potential cause for the problem. So, synchronization has been incorporated for these flag usages.

The same class InactivityMonitor is used at both client and server side. In general, the write check (which sends KeepAliveInfo) happens twice during the time when the read check happens once. Probably the client and server machines have performance differences and just to be safer, now the write check is made to happen thrice during the time when the read check happens once.

The original and the changed source files are attached here.

http://www.nabble.com/file/5958/InactivityMonitor_changed.java

http://www.nabble.com/file/5957/InactivityMonitor_Original.java



 All   Comments   Work Log   Change History   Subversion Commits   FishEye   Crucible      Sort Order: Ascending order - Click to sort in descending order
James Strachan added a comment - 12/Jun/07 06:42 AM
Patch applied with thanks!

Manuel Teira added a comment - 18/Oct/07 01:12 AM - edited
I'm having some issues where slow performance in the backend database seems to lead to InactivityIOExceptions. My opinion is that the fix for this issue is causing those problems:

Having in mind that the design of the readCheck/writeCheck monitors is that we always should have a command among two readCheck (produced by normal activity or by the response of a KeepAlive injected by the writeCheck between two readChecks if needed), synchronizing on the readChecker variable seems a bad idea:

If the chain of transportListener.onCommand() is lasting more than the negociated MaxInactivityDuration, we will end with two readCheck executing one after the other, without a writeCheck between them. The first readCheck success, sets the commandReceived to false, and the second one is always going to fail, as no new command has been processed, nor a writeCheck has been executed since the last readCheck.

I don't see the need of synchronizing on the readChecker/writeChecker variables. These are AtomicBooleans and should be safe to access them from different threads (this seemed to be the motivation of the fix). Also, synchronizing the readCheck/writeCheck is actually delaying the execution of the check, and we cannot guarantee anymore the alternating of writeCheck readCheck execution, that is needed for the solution to work.

Also, synchronizing that is avoiding this code in readCheck() :

synchronized (readChecker) {
        if (inReceive.get()) {
                LOG.trace("A receive is in progress");
                return;
        }

to be reached: because inReceive is only set during onCommand, and that is synchronized on readChecker also. Furthermore, the right way to do this is returning here when a onCommand is performed at the same time, and not waiting, to not break the sequence of readCheck/writeCheck invocations.

In few words, I think that this change breaks the InactivityMonitor, and my opinion is that it should be rolled back. Also in the 4.1 subtask that I opened some time ago.

Any comments?


Manuel Teira added a comment - 18/Oct/07 03:44 AM
After discussing the problem with Rob Davies in the IRC channel, I'm reopening this issue to rollback the synchronization for the readCheck/writeCheck.

Hiram Chirino added a comment - 19/Oct/07 10:31 AM
tightened up the synchronizations in trunk. Now we only synchronize to avoid 2 concurrent sends going the next transport or to avoid 2 concurrent commands / exceptions from being delivered to the transport listener.