Hi, I believe I've found a race condition in Tomcat that causes the http port to be non-responsive. It exists in 6.0 and also in 5.5 (although the code has been refactored). I could not find any reference to it in the Bug database or the mailing list archives. Consider a tomcat instance with maxThreads set to 2, i.e. you have 2 tomcat threads to service incoming requests. The sequence of events is as follows: 1. Thread 1 and Thread 2 are both servicing a request each. 2. A third request comes in. 3. In class JIOEndpoint.java, the acceptor thread calls methods processSocket() which then calls getWorkerThread() which then calls createWorkerThread(). 4. createWorkerThread() returns null since both threads are busy processing the two requests. 5. Here is the race condition in method getWorkerThread() in the code shown below protected Worker getWorkerThread(){ ... Worker workerThread = createWorkerThread(); while (workerThread == null) { try { synchronized (workers) { workers.wait(); } } ... } The acceptor thread executes the "while(workerThread == null)" statement and is then switched out by the CPU. The two threads executing the two requests complete and go into Worker.await() waiting for the next job after executing method recycleWorkerThread(). The acceptor thread is switched back into CPU and executes the synchronized block and goes into the wait(). At this point, there aren't any Worker threads out there processing requests and therefore there isn't any thread to wake up the acceptor thread. The application is non-responsive after this. A simple solution would be to check if curThreadsBusy > 0 in the synchronized block before going into wait() in method getWorkerThread() OR increase the scope of the critical section to include the while loop. Thanks, Harshad Stack Traces below: "bda19102143" id=1578 in WAITING on lock=org.apache.tomcat.util.net.jioendpoint$wor...@13aa4ee3^m at java.lang.Object.wait(Native Method)^M at java.lang.Object.wait(Object.java:485)^M at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:416)^M at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:442)^M at java.lang.Thread.run(Thread.java:619)^M "http-8091-Acceptor-0" id=43 in WAITING on lock=org.apache.tomcat.util.net.jioendpoint$workerst...@13bd7b6a^m at java.lang.Object.wait(Native Method)^M at java.lang.Object.wait(Object.java:485)^M at org.apache.tomcat.util.net.JIoEndpoint.getWorkerThread(JIoEndpoint.java:700)^M at org.apache.tomcat.util.net.JIoEndpoint.processSocket(JIoEndpoint.java:731)^M at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:313)^M at java.lang.Thread.run(Thread.java:619)^M
Thanks for the report and the associated analysis. This issue appears to only affect 6.0.x. The issue had already been fixed in the NIO connector so I have proposed the same fox for the BIO and APR connectors.
> This issue appears to only affect 6.0.x. The above phrase means only that it does not affect trunk. That is because Workers are not used there anymore. This issue does affect TC 5.5.x
Created attachment 25225 [details] 2010-04-02_tc6_bug48843.patch
Created attachment 25226 [details] 2010-04-02_tc55_bug48843.patch
The above patches were proposed for 6.0 and 5.5.
The patch was applied to 5.5 in r934922, will be in 5.5.30 onwards.
The patch was applied to 6.0 in r950341, will be in 6.0.27 onwards.
Fixed a similar issue with AprEndpoint.Poller, AprEndpoint.Sendfile in trunk in r950851 Backport proposed for 6.0. The 5.5 code is similar, but I have not prepared the patch for it yet. Reopening the issue to track this additional fix for AprEndpoint.
Created attachment 25529 [details] 2010-06-04_tc55_bug48843_c8.patch This patch is a backport of r950851 to tc5.5.x. It will be proposed for 5.5.
Regarding AprEndpoint.Poller, AprEndpoint.Sendfile fix (comment 8 and below, r950851 and attachment 25529 [details]): It is not a deadlock there. It is a missed wakeup in add queue in AprEndpoint.Poller.add() and AprEndpoint.Sendfile.add(). Tomcat does not stop processing requests and the next request will wake up the queue. The fix also changes handling of unexpected errors when processing the add queue in AprEndpoint.Poller.run() and AprEndpoint.Sendfile.run(), by zeroing the addCount variable.
The AprEndpoint.Poller & Sendfile fix applied to 6.0 in r953010 and will be in 6.0.27 onwards.
The second issue has been fixed in 5.5.x and will also be included in 5.5.30 onwards.