Bug 53173 - maxConnections feature hangs the system
Summary: maxConnections feature hangs the system
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 7
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 7.0.27
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 53186 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-05-01 13:54 UTC by Filip Hanik
Modified: 2013-11-09 03:56 UTC (History)
2 users (show)



Attachments
fix missing count down for maxConnections latch (11.05 KB, application/octet-stream)
2012-05-01 13:54 UTC, Filip Hanik
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Hanik 2012-05-01 13:54:22 UTC
Created attachment 28704 [details]
fix missing count down for maxConnections latch

We've run into a scenario where the JIO Acceptor thread hangs as connections are not counted down properly.

<Executor name="tomcatThreadPool" 
          namePrefix="tomcat-8080-" 
          minSpareThreads="50" 
          maxThreads="300"/>

<Connector port="8080" 
           redirectPort="${bio.https.port}"              
           protocol="org.apache.coyote.http11.Http11Protocol"
           maxKeepAliveRequests="15" 
           executor="tomcatThreadPool" 
           connectionTimeout="20000" 
           acceptCount="100"/>

Thread dump yields
"http-bio-8080-Acceptor-0" daemon prio=3 tid=0xXXXXXXXX nid=0xXX waiting on condition [0xXXXXXXXX..0xXXXXXXXX]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0xXXXXXXXX> (a org.apache.tomcat.util.threads.LimitLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
	at org.apache.tomcat.util.threads.LimitLatch.countUpOrAwait(LimitLatch.java:99)
	at org.apache.tomcat.util.net.AbstractEndpoint.countUpOrAwaitConnection(AbstractEndpoint.java:660)
	at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:210)
	at java.lang.Thread.run(Thread.java:619)

This, as you may imagine, is a fairly hard use case to reproduce into a simple test case. The easiest way to reproduce it is to create the following configuration
    <Executor name="tomcatThreadPool" 
              namePrefix="catalina-exec-"
              maxThreads="5" 
              minSpareThreads="0" 
              maxQueueSize="15"/>
<Connector port="8080" 
           protocol="HTTP/1.1" executor="tomcatThreadPool"
           connectionTimeout="10000"
           redirectPort="8443" 
           maxConnections="30"/>

This reproduces one test case, where the state machine is not taking into account that connections may be rejected by the queue, but it doesn't count down the latch.
I'm attaching a patch to fix this specific use case, but it may not be a complete fix. As a workaround, the patch also introduces the maxConnections="-1" configuration that disables the usage of maxConnections. The -1 setting is important to give administrator a workaround while the other edge cases are tracked down.


I have not been able to reproduce this error with NIO connector.

There is one more place in the JioEndpoint that requires handling of RejectedExecutionException in the 
public boolean processSocketAsync(SocketWrapper<Socket> socket,SocketStatus status) 
This is currently unhandled.
Comment 1 Filip Hanik 2012-05-02 17:18:42 UTC
Fixed in trunk r1333114
Fixed in 7.0.x r1333116
Will be made available in Apache Tomcat 7.0.28 and onwards

Additional option added
maxConnection=-1 to disable connection counting
Comment 2 Filip Hanik 2012-05-02 19:42:36 UTC
Adding documentation to track cases where this can happen

May 2, 2012 3:04:03 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor run
SEVERE: Socket accept failed
java.io.IOException: Too many open files
       at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
       at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152)
       at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:784)
       at java.lang.Thread.run(Thread.java:662)
Comment 3 Filip Hanik 2012-05-07 12:47:40 UTC
*** Bug 53186 has been marked as a duplicate of this bug. ***
Comment 4 Filip Hanik 2012-05-23 15:38:45 UTC
Just for documentation purposes, found the root cause of this problem, and it all makes sense now

http://svn.apache.org/viewvc/tomcat/tc7.0.x/trunk/java/org/apache/tomcat/util/net/NioEndpoint.java?r1=1127961&r2=1127962&

in r1127962

This change counts up connection before it has been validated to be working. Prior to this change, count up only occurred if a connection was valid and added to the poller.
Comment 5 brauckmann 2012-08-31 08:57:45 UTC
Hi Filip.

Is it possible that this bugfix did not completely solve the problem?

When doing a load test I encountered a stuck tomcat 7.0.29 with all threads in socketRead0 and no thread handling the web application:

   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:150)
        at java.net.SocketInputStream.read(SocketInputStream.java:121)
        at org.apache.coyote.ajp.AjpProcessor.read(AjpProcessor.java:309)
        at org.apache.coyote.ajp.AjpProcessor.readMessage(AjpProcessor.java:364)
        at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:128)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585)
        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
        - locked <0x9e2b2098> (a org.apache.tomcat.util.net.SocketWrapper)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)


The Acceptor thread is stuck:

"ajp-bio-127.0.0.1-8009-Acceptor-0" daemon prio=10 tid=0x6eaf2800 nid=0x18ee waiting on condition [0x6e15c000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0xa23cf128> (a org.apache.tomcat.util.threads.LimitLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
        at org.apache.tomcat.util.threads.LimitLatch.countUpOrAwait(LimitLatch.java:115)
        at org.apache.tomcat.util.net.AbstractEndpoint.countUpOrAwaitConnection(AbstractEndpoint.java:718)
        at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:210)
        at java.lang.Thread.run(Thread.java:722)


This looks exactly like the problem that this patch was supposed to fix.

The configuration: Apache with MaxClients 4000, mod_jk, tomcat 7.0.29 with a very simple Connector configuration (all thread and connection parameters left to default).

Tested with two different web applications that have nothing in common. The problem occurs only during heavy load.

The problem disappears when the Apache and the Tomcat parameters are adjusted so that MaxClients < maxThreads.
Comment 6 Konstantin Kolinko 2012-08-31 10:04:34 UTC
(In reply to comment #5)
> Hi Filip.
> 
> Is it possible that this bugfix did not completely solve the problem?
> 
> When doing a load test I encountered a stuck tomcat 7.0.29 with all threads
> in socketRead0 and no thread handling the web application:
> 
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:150)
>         at java.net.SocketInputStream.read(SocketInputStream.java:121)
>         at org.apache.coyote.ajp.AjpProcessor.read(AjpProcessor.java:309)
>         at
> org.apache.coyote.ajp.AjpProcessor.readMessage(AjpProcessor.java:364)

This one is busy reading data on an existing connection. (It is between requests, so no web application classes are mentioned in the stack trace).

It is reading a socket. It cannot serve requests on other sockets. It cannot be used for new requests.

> The Acceptor thread is stuck:
> 
> "ajp-bio-127.0.0.1-8009-Acceptor-0" daemon prio=10 tid=0x6eaf2800 nid=0x18ee
> waiting on condition [0x6e15c000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0xa23cf128> (a
> org.apache.tomcat.util.threads.LimitLatch$Sync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)

This one, yes, is waiting on the counter and would not accept any more request until the counter goes down.

> with a very simple Connector configuration (all thread and connection
> parameters left to default).

> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)

You are using the BIO connector implementation.

Which means (according to the configuration reference for AJP connectors):
maxThreads = 200
maxConnections = maxThreads

So if you have 200 working threads and all are busy "reading" (waiting for data on existing socket),  then it is by design. You are not able to start 201th thread, so there is no point in accepting the 201th connection.


Anyway, as was written earlier,
> Additional option added
> maxConnection=-1 to disable connection counting
Comment 7 brauckmann 2012-09-03 07:42:41 UTC
Thanks for explaining.
Comment 8 Robert Hardin 2013-11-08 18:41:04 UTC
(in reply to comments #5 and #6)

There still exists a defect in Tomcat 7.0.47 (most likely with the Executor of the connector threads) where, as described in brauckmann's comment #5, all threads in the Tomcat thread pool will begin hanging at peak traffic times. By "peak" I'm referring to complete thread pool saturation. The bug is highly recreatable when maxThreads<=maxConnections in config and the threads are executing some logic (jsp, servlet, etc.) with relatively extensive, timely operations (like reading from a database, file I/O, etc). Once the thread pool is exhausted and if the maxConnections continues to allow & queue new requests pending available threads, the Executor seems to start hanging the Tomcat threads themselves (somehow) - it is not the operation that the threads themselves are executing that's hanging the threads either; neither is it the number of connections still available (due to a connection count).

It was stated in comment #5 that "The problem disappears when the Apache and the Tomcat parameters are adjusted so that MaxClients < maxThreads" and even the sample configuration suggested in the very first comment from Filip that allows for recreation of the bug suggests a bug manifestation when the number of socket connections allowed by Tomcat exceed the available number of threads to handle incoming requests:

    <Executor name="tomcatThreadPool" 
              namePrefix="catalina-exec-"
              maxThreads="5" 
              minSpareThreads="0" 
              maxQueueSize="15"/>
<Connector port="8080" 
           protocol="HTTP/1.1" executor="tomcatThreadPool"
           connectionTimeout="10000"
           redirectPort="8443" 
           maxConnections="30"/>

This is why adding maxConnections=-1 as an option (effectively putting no upper-limit to accepting client socket connections) actually just makes matters worse. After changing my configuration (where I had previously just specified maxThreads=200) to maxThreads=200 and maxConnections=-1, and by running a simple HTTP grinder against a small, test web application on Tomcat I had brought all 200 connector threads to their knees in a matter of seconds (Tomcat completely unresponsive until Catalina Tomcat was bounced). These threads never recovered and the back end processes that the web app started from the thread were all also hung (until the Tomcat server restart).

To recreate, simply write a simple webapp containing a jsp to execute system process 'netstat -a' and route the process stdout to the http response. Then from an exerciser client, spin up something like 500 threads each continually submitting HTTP gets against the Tomcat server (web app jsp) every 1/2 second or so.

The only way I was able to keep Tomcat server thread pool healthy and the server responsive was to make sure maxThreads>maxConnections. Setting maxConnections=-1, or maxConnections>=maxThreads all result in hung connector threads and ultimately an unresponsive Tomcat server on that connector.

Best,
Robert
Comment 9 Filip Hanik 2013-11-08 19:32:16 UTC
A work around for you should be using the NIO connector, as it doesn't use threads in between keep alives, and don't set any max connections

<Connector port="8080" 
           protocol="org.apache.coyote.http11.Http11NioProtocol" ...
Comment 10 Robert Hardin 2013-11-08 20:25:12 UTC
Thank you Filip. I'll give it a try.

Would I continue to use the maxThreads parameter to throttle traffic (to prevent client requests from burning up my JVM)? If not, how would I go about configuring an NIO connector to max out at ~200 simultaneous request fulfillments?

Are there any known down-sides to continuing use of BIO with a maxThreads>maxConnections relationship?

Thanks
Comment 11 Konstantin Kolinko 2013-11-09 03:56:51 UTC
Restoring the original Version (7.0.27) and Hardware field values.

(In reply to Robert Hardin from comment #8)
> the Executor seems to
> start hanging the Tomcat threads themselves (somehow)

Take a Thread dump. Show it.

> Are there any known down-sides

Ask on the mailing list. Bugzilla is not a support forum.