Bug 48843 - Tomcat Acceptor Thread goes into wait() and it will never come back
Summary: Tomcat Acceptor Thread goes into wait() and it will never come back
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 6
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 6.0.18
Hardware: All All
: P2 major (vote)
Target Milestone: default
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-02 17:43 UTC by Harshad Kamat
Modified: 2014-02-17 13:53 UTC (History)
0 users



Attachments
2010-04-02_tc6_bug48843.patch (3.26 KB, patch)
2010-04-02 15:41 UTC, Konstantin Kolinko
Details | Diff
2010-04-02_tc55_bug48843.patch (1.08 KB, patch)
2010-04-02 15:51 UTC, Konstantin Kolinko
Details | Diff
2010-06-04_tc55_bug48843_c8.patch (7.69 KB, patch)
2010-06-04 16:26 UTC, Konstantin Kolinko
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Harshad Kamat 2010-03-02 17:43:49 UTC
Hi,

I believe I've found a race condition in Tomcat that causes the http port to
be non-responsive. It exists in 6.0 and also in 5.5 (although the code has
been refactored).
I could not find any reference to it in the Bug database or the mailing list
archives.

Consider a tomcat instance with maxThreads set to 2, i.e. you have 2 tomcat
threads to service incoming requests.
The sequence of events is as follows:
1. Thread 1 and Thread 2 are both servicing a request each.
2. A third request comes in.
3. In class JIOEndpoint.java, the acceptor thread calls methods
processSocket() which then calls getWorkerThread() which then calls
createWorkerThread().
4. createWorkerThread() returns null since both threads are busy processing
the two requests.
5. Here is the race condition in method getWorkerThread() in the code shown
below

protected Worker getWorkerThread(){
...
Worker workerThread = createWorkerThread();
        while (workerThread == null) {
            try {
                synchronized (workers) {
                    workers.wait();
                 }
            }
...
}

The acceptor thread executes the "while(workerThread == null)" statement and
is then switched out by the CPU.
The two threads executing the two requests complete and go into
Worker.await() waiting for the next job after executing method
recycleWorkerThread().
The acceptor thread is switched back into CPU and executes the synchronized
block and goes into the wait().

At this point, there aren't any Worker threads out there processing requests
and therefore there isn't any thread to wake up the acceptor thread.
The application is non-responsive after this.

A simple solution would be to check if curThreadsBusy > 0 in the
synchronized block before going into wait() in method getWorkerThread()
OR increase the scope of the critical section to include the while loop.

Thanks,
Harshad

Stack Traces below:

"bda19102143" id=1578 in WAITING on
lock=org.apache.tomcat.util.net.jioendpoint$wor...@13aa4ee3^m
    at java.lang.Object.wait(Native Method)^M
    at java.lang.Object.wait(Object.java:485)^M
    at
org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:416)^M
    at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:442)^M
    at java.lang.Thread.run(Thread.java:619)^M

"http-8091-Acceptor-0" id=43 in WAITING on
lock=org.apache.tomcat.util.net.jioendpoint$workerst...@13bd7b6a^m
    at java.lang.Object.wait(Native Method)^M
    at java.lang.Object.wait(Object.java:485)^M
    at
org.apache.tomcat.util.net.JIoEndpoint.getWorkerThread(JIoEndpoint.java:700)^M
    at
org.apache.tomcat.util.net.JIoEndpoint.processSocket(JIoEndpoint.java:731)^M
    at
org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:313)^M
    at java.lang.Thread.run(Thread.java:619)^M
Comment 1 Mark Thomas 2010-03-29 12:43:04 UTC
Thanks for the report and the associated analysis.

This issue appears to only affect 6.0.x.

The issue had already been fixed in the NIO connector so I have proposed the same fox for the BIO and APR connectors.
Comment 2 Konstantin Kolinko 2010-04-02 13:59:46 UTC
> This issue appears to only affect 6.0.x.
The above phrase means only that it does not affect trunk. That is because Workers are not used there anymore. This issue does affect TC 5.5.x
Comment 3 Konstantin Kolinko 2010-04-02 15:41:21 UTC
Created attachment 25225 [details]
2010-04-02_tc6_bug48843.patch
Comment 4 Konstantin Kolinko 2010-04-02 15:51:29 UTC
Created attachment 25226 [details]
2010-04-02_tc55_bug48843.patch
Comment 5 Konstantin Kolinko 2010-04-02 15:58:04 UTC
The above patches were proposed for 6.0 and 5.5.
Comment 6 Konstantin Kolinko 2010-04-16 10:39:19 UTC
The patch was applied to 5.5 in r934922, will be in 5.5.30 onwards.
Comment 7 Konstantin Kolinko 2010-06-01 22:41:29 UTC
The patch was applied to 6.0 in r950341, will be in 6.0.27 onwards.
Comment 8 Konstantin Kolinko 2010-06-02 21:54:15 UTC
Fixed a similar issue with AprEndpoint.Poller, AprEndpoint.Sendfile in trunk in r950851

Backport proposed for 6.0.

The 5.5 code is similar, but I have not prepared the patch for it yet.
Reopening the issue to track this additional fix for AprEndpoint.
Comment 9 Konstantin Kolinko 2010-06-04 16:26:23 UTC
Created attachment 25529 [details]
2010-06-04_tc55_bug48843_c8.patch

This patch is a backport of r950851 to tc5.5.x. It will be proposed for 5.5.
Comment 10 Konstantin Kolinko 2010-06-07 12:20:24 UTC
Regarding AprEndpoint.Poller, AprEndpoint.Sendfile fix (comment 8 and below, r950851 and attachment 25529 [details]):

It is not a deadlock there. It is a missed wakeup in add queue in AprEndpoint.Poller.add() and AprEndpoint.Sendfile.add(). Tomcat does not stop processing requests and the next request will wake up the queue.

The fix also changes handling of unexpected errors when processing the add queue in AprEndpoint.Poller.run() and AprEndpoint.Sendfile.run(), by zeroing the addCount variable.
Comment 11 Konstantin Kolinko 2010-06-09 10:09:20 UTC
The AprEndpoint.Poller & Sendfile fix applied to 6.0 in r953010 and will be in 6.0.27 onwards.
Comment 12 Mark Thomas 2010-06-22 04:55:08 UTC
The second issue has been fixed in 5.5.x and will also be included in 5.5.30 onwards.