Bug 53061 - tomcat asynchronous invocation problem
Summary: tomcat asynchronous invocation problem
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 7
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 7.0.25
Hardware: Other Linux
: P2 major (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-11 07:30 UTC by Slava
Modified: 2012-05-10 17:47 UTC (History)
1 user (show)



Attachments
a simple test reproducing the problem (5.30 KB, application/zip)
2012-04-11 07:30 UTC, Slava
Details
war (512.15 KB, application/octet-stream)
2012-05-10 07:59 UTC, Slava
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Slava 2012-04-11 07:30:35 UTC
Created attachment 28581 [details]
a simple test reproducing the problem

We encountered a problem during asynchronous operations (tomcat7 with servlet 3). 
Description:  
One client continuously sends post requests to the server. On the server side for each request created AsyncContext with timeout 20 seconds:
AsyncContext asyncContext = req.startAsync(req, resp);
      asyncContext.setTimeout(20000);
As expected after approximately 20 sec the requests are completed. Then another client also begins to send requests to the server but in this case they are explicitly completed after 500 milliseconds. Something like this:
AsyncContext asyncContext = req.startAsync(req, resp);
      asyncContext.setTimeout(20000);      
      try {
         Thread.sleep(500);
      } catch (Exception e) {
      }

      asyncContext.complete();

The problem is that after running the second client all requests that are waiting for timeout (from the first client) are stuck and not released (onTimeout method of AsyncEvent is not called) until the second client stops sending requests.
It looks like the problem occurs on Linux but not on Windows.

I attached a simple test that may help to reproduce this issue. The following servlet accepts URL parameter “complete”.
When “complete=1” the request will be completed after 500ms.
Otherwise the request will wait till timeout (20 seconds).
Run client that periodically sends requests to /servlet?complete=0. (I have tested it with 10 parallel threads that run in a loop)
Then run another client that periodically sends requests to /servlet?complete=1 (This client can use only a single thread)
See that the first client does not receive any responses while the second client is running.
Comment 1 Slava 2012-04-11 07:46:56 UTC
Reproduced on: 
 - tomcat-7.0.23, 7.0.25 and 7.0.26
 - java 64 bit 1.6.0_18 and  1.6.0_29
 - Linux CentOS release 5.6 (Final) 64 bit
 - NIO connector
Comment 2 Mark Thomas 2012-05-09 20:24:35 UTC
I extracted the key elements from this test case into my own test web application as thay as quicker than downloading Maven, installing it and figuring out how to use it.

The test worked for me on both Linux and Windows using the NIO connector.

If you still see this problem, feel free to re-open this issue but you will need to provide a ready-to-run WAR that demonstrates the issue along with the source for that WAR.
Comment 3 Slava 2012-05-10 07:59:00 UTC
Created attachment 28754 [details]
war

Added war to the provided souorce
Comment 4 Slava 2012-05-10 08:03:19 UTC
We still have this problem and it is a critical for our project. I attached a war file and reopened the issue.
Comment 5 Mark Thomas 2012-05-10 10:30:12 UTC
Thanks for that. I can now re-produce it. I can also reproduce it on Windows. The key to reproducing it is to refresh the browser more frequently then once a second - the faster the refresh the clearer the reproduction.

Looking at the NIO source code, there is a test that essentially means that timeouts only get processed after one second of inactivity so under high, constant load, timeouts will never be processed.

I find it hard to believe it has always been like that so I'll need to go back and research who changed it (probably me), when it changed (probably the refactoring) and why it changed (probably an error on my part). I should be able to get that done pretty quickly.
Comment 6 Mark Thomas 2012-05-10 17:47:37 UTC
For once, it wasn't an issue triggered by the refactoring. It looks like a long standing issue that just hasn't been noticed before now.

This has been fixed in trunk and 7.0.x and will be included in 7.0.28 onwards.