Summary: | AJP proxy not thread-safe? | ||
---|---|---|---|
Product: | Apache httpd-2 | Reporter: | Brad Boyer <brad_boyer> |
Component: | mod_proxy | Assignee: | Apache HTTPD Bugs Mailing List <bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | yneko |
Priority: | P2 | ||
Version: | 2.2.0 | ||
Target Milestone: | --- | ||
Hardware: | Other | ||
OS: | Linux | ||
Attachments: |
Debugging output patch referenced comment #9.
make mod_proxy_balancer init_balancer_member call ap_proxy_initialize_worker |
Description
Brad Boyer
2006-01-11 20:44:12 UTC
In order to make it reconnect please add retry=1 to your ProxyPass directive (see also parameter retry at http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass) Have you checked for the Tomcat logs? The first error seems to indicate that Tomcat is not responding any longer I have experienced the same bug, the problem also seem to cause the tomcat server (5.0.28) to run out of threads, I assume the problem is also causing the ajp threads not to be closed, so I assume using a retry will not help. Yet to confirm whether using prefork instead of worker fixes the problem, but I'll try this and add this to the bug report. ajp_read_header: ajp_ilink_receive failed (120006)APR does not understand this error code: proxy: read response failed from (null) (<ip_addr_here>) Then later proxy: AJP: failed to make connection to backend: <ip_addr_here> (110)Connection timed out: proxy: AJP: attempt to connect to <ip_addr_here>:8009 (<ip_addr_here>) failed It looks like tomcat can recover if it is under light load and the threads timeout, but when the load increases tomcat dies as all the threads get used up Maybe PR#36495 is a similar bug. Could you please provide your proxy config? The config in bug 36495 is very similar to my original situation. Apache httpd 2.2 and tomcat runnnig together on a single Linux box with the config on httpd using ProxyPass to redirect to 127.0.0.1:8009. The messages once the poster moved to 2.2 are basically the same. Here's the proxy config I have (with some URL segments changed to xxx): ProxyRequests off <Proxy *> Order allow,deny Deny from all </Proxy> <Proxy ajp://127.0.0.1:8009/*> Order deny,allow Allow from all </Proxy> ProxyPass /xxx ajp://127.0.0.1:8009/xxx After further testing this appears to be a problem with a misconfigured tomcat, the ajp connector behaves differently in worker mode than prefork and requires a larger number of tomcat ajp threads. Tested this under extreemly heavy load and all seems to work ok with no problems. This is all documented in the connector docs, I guess I should have picked this up earlier. Thanks for feedback. Could you please add the link to the connector docs where you found the information that solved your problem? Fixing the tomcat configuration did help, but it still appears that we are having the same problem where tomcat will run out of threads using ajp mod_proxy in worker mode. We cannot reproduce this problem in testing, only on the live systems. No real errors from apache except this, by which point tomcat has run out of threads, which I assume why this error appears. [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header We also get this error in prefork mode, but it does not seem to cause tomcat to die I'm seeing similar errors. The setup is Apache 2.2.0 with mod_proxy_ajp and Tomcat 5.0.27. I tried RewriteRules with [P] first, then switched to ProxyPass - same result. This is the log of the virtual host: (Servernames replaced by $MYSERVER, IPs replaced by $LOCALIP) ==> httpd_error_log <== [Thu Mar 23 16:23:09 2006] [error] ajp_read_header: ajp_ilink_receive failed [Thu Mar 23 16:23:09 2006] [error] (120006)APR does not understand this error code: proxy: read response failed from (null) (localhost) This is the global apache log: ==> /opt/httpd/logs/error_log <== [Thu Mar 23 16:23:09 2006] [error] [client $LOCALIP] proxy: error reading status line from remote server $MYSERVER referer: $REFERER [Thu Mar 23 16:23:09 2006] [error] [client $LOCALIP] proxy: Error reading from remote server returned by $URL, referer: $REFERER [Thu Mar 23 16:23:09 2006] [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header Nothing fancy shows up in catalina.out or Tomcat logs, just a lot of Mar 23, 2006 4:24:40 PM org.apache.jk.common.ChannelSocket processConnection INFO: connection timeout reached but they seem normal and occur all the time. I do _not_ see this behaviour under heavy load, I see it almost instantly. Things get worse: Connections from Apache to Tomcat stay open: [root@server root]# netstat -tn|sed -n -e '1,2p;/8059/p' Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 1 0 127.0.0.1:46802 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46814 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46790 127.0.0.1:8059 CLOSE_WAIT tcp 0 0 127.0.0.1:46826 127.0.0.1:8059 ESTABLISHED tcp 1 0 127.0.0.1:46729 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46777 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46753 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46765 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46697 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46604 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46636 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46544 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46551 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46518 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46505 127.0.0.1:8059 CLOSE_WAIT tcp 1 0 127.0.0.1:46414 127.0.0.1:8059 CLOSE_WAIT tcp 794 0 127.0.0.1:8059 127.0.0.1:46328 CLOSE_WAIT tcp 0 0 127.0.0.1:8059 127.0.0.1:46826 ESTABLISHED There are times when no ESTABLISHED connection is available - it seems to take quite long for tomcat to recover. Not that the Recv-Q has data in it! server.xml snippet: <Connector protocol="AJP/1.3" address="127.0.0.1" port="8059" minProcessors="10" maxProcessors="250" maxPostSize="0" enableLookups="false" useBodyEncodingForURI="true" acceptCount="25" debug="0" redirectPort="" connectionTimeout="60000" /> (Yes, there are probably some superfluous settings there). I do not see an increase in thread count or apache child count (using prefork MPM). Created attachment 18031 [details] Debugging output patch referenced comment #9. I believe I'm seeing the same problem with my build of Apache 2.2.0 connecting to an instance of Tomcat 5.5.10. THE SETUP I'm running this on a dual cpu (Intel(R) Xeon(TM) CPU 2.80GHz) machine running Linux 2.4.21-15.ELsmp. Apache server-info reveals: -D APACHE_MPM_DIR="server/mpm/worker" -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_SYSVSEM_SERIALIZE -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D HTTPD_ROOT="/usr/local/apache/2.2.0" -D SUEXEC_BIN="/usr/local/apache/2.2.0/bin/suexec" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="conf/mime.types" -D SERVER_CONFIG_FILE="conf/httpd.conf" I'm using mod_proxy to balance incoming requests to an Apache VirtualHost to a mounted Tomcat service via AJP. I think I've included the important Apache configuration below: <Proxy balancer://tomcat> BalancerMember ajp://server.mydomain.org:8009 route=server.mydomain.org </Proxy> <VirtualHost vhost.mydomain.org:80> RewriteEngine on RewriteCond %{ENV:HTTPD_BASE}/logs/myapp.pause -f RewriteRule .* - [forbidden,last] RewriteRule ^(/+test(?:/.*)?)$ /myapp$1 [env=internal:yes] RewriteCond %{ENV:internal} =yes RewriteRule ^(/+myapp/.*) $1 [passthrough,last] RewriteRule ^/+myapp/(.*) /$1 [redirect=permanent,last] ProxyPass /myapp balancer://tomcat/myapp stickysession=JSESSIONID nofailover=on ProxyPassReverse / http://vhost.mydomain.org/myapp/ ProxyPassReverseCookiePath /myapp/ / </VirtualHost> My Tomcat Connector in server.xml is straightforward: <Connector protocol="AJP/1.3" address="${catalina.hostname}" port="8009" enableLookups="false" /> So a user connects to http://vhost.mydomain.org/test and then mod_proxy will forward the request via AJP to http://server.mydomain.org:8009/myapp/test The 'test' servlet is a simple one. It computes a random number been 0 and 1000, and sleeps for that many milliseconds, before printing a one line response to the client. THE PROBLEM Using this stripped down configuration, what I see when I use ab(1) to load test the service, is that Tomcat is forced to keep creating new threads to service incoming connections, and it leaves them in the Keepalive state after ab(1) has finished. Tomcat also indicates it is *using* those threads (perhaps polling on them?) Every time I make a new request via mod_proxy, a new Thread is started in Tomcat. However, when I query Tomcat directly, I see a thread get created, the request gets serviced, and the thread is switch to the Ready steady, indicating it is now able to to accept new requests. If I run netstat -t | egrep "^tcp.*:8009" on the server, I see a constantly growing number of connections between Apache and Tomcat. When I query Tomcat directly, I do not see this growing pool of connections. This lead me to believe the problem was mod_proxy keeping connections open (perhaps the backend pooling code), but failing to either reuse them, or perhaps failing to properly close them on cleanup. THE DEBUGGING First, I added some debugging to proxy_util.c. I've attached it to this comment. Basically all I did was print out a stupid log message when it first initializes, and then each time a connections is acquired. Now, starting up Apache I see this in the log: [Wed Apr 05 21:09:35 2006] [error] 0: worker->hmax: 25, worker->cp->res: not null [Wed Apr 05 21:09:35 2006] [debug] proxy_util.c(1666): proxy: initialized worker 0 in child 16610 for (server.mydomain.org) min=0 max=25 smax=25 [Wed Apr 05 21:09:35 2006] [error] 0: worker->hmax: 25, worker->cp->res: not null [Wed Apr 05 21:09:35 2006] [debug] proxy_util.c(1666): proxy: initialized worker 2 in child 16610 for (*) min=0 max=25 smax=25 For my test I first see if any connections exist to Tomcat (there should not be any): ; netstat -t | egrep "^tcp.*:8009" | tr -s " " As I expect, no connections exist. Next, on a different machine, I make a single request to my virtual host: ab -n 1 http://vhost.mydomain.org/test It completes, and tells me it was successful. I see a bunch of debugging in the log indicating mod_proxy is working. In the chatter, I see my own debugging lines: [Wed Apr 05 21:09:43 2006] [error] 1: worker->hmax: 0, worker->cp->res: not null [Wed Apr 05 21:09:43 2006] [error] 2: called connection_constructor This surprises me for two reasons. First of all, I was expecting worker->hmax to be 25, not 0. The second reason is that I expected to see that worker->cp->res have a value, not to be null. [As an aside, subsequent testing showed that if I added 'max=25' to the BalanceMember configuration directive, worker->hmax would be 25 in requests; the null value for worker->cp->res did not change.] Now, I check on my server to see if any connections exist (I expect a connection, due to the connection pooling): ; netstat -t | egrep "^tcp.*:8009" | tr -s " " tcp 0 0 server.mydomain.org:8009 server.mydomain.org:39642 ESTABLISHED tcp 0 0 server.mydomain.org:39642 server.mydomain.org:8009 ESTABLISHED Now, if I run a second request, I would have expected mod_proxy_ajp to reuse the existing connection. But look what happens after I ran my ab command a second time: [Wed Apr 05 21:09:46 2006] [error] 1: worker->hmax: 0, worker->cp->res: null [Wed Apr 05 21:09:46 2006] [error] 2: called connection_constructor And netstat shows two *new* connections, for a total of four: ; netstat -t | egrep "^tcp.*:8009" | tr -s " " tcp 0 0 server.mydomain.org:8009 server.mydomain.org:39753 ESTABLISHED tcp 0 0 server.mydomain.org:8009 server.mydomain.org:39642 ESTABLISHED tcp 0 0 server.mydomain.org:39753 server.mydomain.org:8009 ESTABLISHED tcp 0 0 server.mydomain.org:39642 server.mydomain.org:8009 ESTABLISHED I tried making a number of connections, and watched the connections grow until it filled all available threads on my Tomcat container (which was set to the default maxThreads of 200). First, I reset everything: ; apachectl stop ; catalina.sh stop ; apachectl start ; catalina.sh start ; netstat -t | egrep "^tcp.*:8009" | tr -s " "|wc -l 0 And then on my test client I make 200 connections: ab -n 200 http://vhost.mydomain.org/test and then my server shows over 400 established connections: ; netstat -t | egrep "^tcp.*:8009" | tr -s " "|grep -c ESTABLISHED 466 Ack! If I tweak the proxy code, setting one of these the 'close' flags to true: conn->close_on_recycle = 1; conn->close = 1; Then Apache cleans itself up, and Tomcat does not get overwhelmed. What I'm unable to determine so far is if this is a problem with mod_proxy(_(balance|ajp))?.c, a problem with Tomcat + Apache, a problem with the APR utilities, etc. :( Any insights from folks would be much appreciated! Apologies for the long post... Jim Thank you very much for your detailed post. I suspect a thing that has been fixed meanwhile. Could you please give http://httpd.apache.org/dev/dist/httpd-2.2.1.tar.gz a try? Keep in mind that this is NO official release of httpd-2.2.1, but one that the developers are currently deciding on whether to release it or not. It is known that this tarball currently contains a bug that prevents using SSL backends with the proxy, but this does not harm your current problem. It would be very nice if you could give it a try and let me know the results. Another thing regarding your astonishments regarding the reusing of connections. Keep in mind that the pool limit is PER httpd process. So the maximum number of connections is not what you set via max, but <max> * <maximum number of httpd processes>. You have not posted your MPM settings here, so if you allow httpd to create 8 processes, 200 connections to the backend are ok. > And netstat shows two *new* connections, for a total of four: > > ; netstat -t | egrep "^tcp.*:8009" | tr -s " " > tcp 0 0 server.mydomain.org:8009 server.mydomain.org:39753 ESTABLISHED > tcp 0 0 server.mydomain.org:8009 server.mydomain.org:39642 ESTABLISHED > tcp 0 0 server.mydomain.org:39753 server.mydomain.org:8009 ESTABLISHED > tcp 0 0 server.mydomain.org:39642 server.mydomain.org:8009 ESTABLISHED Actually there is only *one* more connection, but as httpd and Tomcat are running on the same server you see *both* ends of the tcp connection in netstat. If you are running httpd and tomcat on the same server you have to divide your netstat results by 2. (In reply to comment #10) > Thank you very much for your detailed post. I suspect a thing that has been > fixed meanwhile. Could you please give > http://httpd.apache.org/dev/dist/httpd-2.2.1.tar.gz a try? I certainly would be happy to! I was about to post that I had added some more debugging and discovered that mod_proxy was emitting info that it had initialized worker->id 0 and worked->id 2, but not worker->id 1, and that mod_proxy_ajp was being handed worker->id 1. Dunno if that was the problem being resolved in the fix. I will try 2.2.1 right now. It'll take me a few minutes to compile and test, but I'll post back here within the hour. > Actually there is only *one* more connection, but as httpd and Tomcat are > running on the same server you see *both* ends of the tcp connection in Sorry, reading my original post shows I put too much stress on the high number of connections. I was in fact realizing the two lines were from the same connection, I was just surprised that new sockets kept getting added instead of old ones being reused. (In reply to comment #10) > I suspect a thing that has been fixed meanwhile. Could you please give > http://httpd.apache.org/dev/dist/httpd-2.2.1.tar.gz a try? Keep in mind that > this is NO official release of httpd-2.2.1, but one that the developers are > currently deciding on whether to release it or not. It is known that this > tarball currently contains a bug that prevents using SSL backends with the > proxy, but this does not harm your current problem. It would be very nice if > you could give it a try and let me know the results. Hi, I tried out 2.2.1, and I'm now seeing the kind of connection growth I expected. I slowly increased the concurrent requests (from the original *1* thread I was using in the tests I originally posted about), to tens of threads, without seeing an undue increase in the number of used connections. I note that Tomcat does still eventually reach 200 active threads when I hit it Apache with many requests, but it appears as though Apache is now reusing existing connections. From this first pass, it looks to me like 2.2.1 fixes the problem I was having. Thank you very much for your help! (In reply to comment #12) > > I note that Tomcat does still eventually reach 200 active threads when > I hit it Apache with many requests, but it appears as though Apache is As mentioned, this possibly works as designed. This depends on your MPM settings. > > From this first pass, it looks to me like 2.2.1 fixes the problem I was Sounds good. Please let me know the results if you do more intense tests. Created attachment 18055 [details] make mod_proxy_balancer init_balancer_member call ap_proxy_initialize_worker (In reply to comment #13) > > As mentioned, this possibly works as designed. This depends on your MPM settings. > > > > > From this first pass, it looks to me like 2.2.1 fixes the problem I was > > Sounds good. Please let me know the results if you do more intense tests. Hi, I've done some more testing, and I *think* I may have found a bug with the load balancer. I am running httpd 2.2.1, with server/thread limits of StartServers 8 ServerLimit 16 ThreadsPerChild 64 I allowed for 1024 threads on the Tomcat side. The proxy setup is the same as I mention in comment #9. When I was initially load testing the 2.2.1 server, I happened to be setting the max attributes on the balancer: <Proxy balancer://tomcat> BalancerMember ajp://server.mydomain.org:8009 route=server.mydomain.org max=64 </Proxy> This seemed rock solid, I threw batches of 500+ concurrent requests from our Sun Fire T2000 to this linux server, and it dealt with the load. However, I then tried removing the 'max' attribute, and ran into some problem. I found that, without the hard-coded the max attribute, I could reliably lock the system up when throwing 500 concurrent requests at it. My understanding from previous comments and the notes is that, if one does not specify additional attributes, the defaults should be to use the ThreadsPerChild to set s/max: smin=0 smax=64 max=64 Setting LogLevel to debug I saw that mod_proxy code says [debug] proxy_util.c(1690): proxy: initialized worker 0 in child 15926 for (argo02.highwire.org) min=0 max=64 smax=64 [debug] proxy_util.c(1690): proxy: initialized worker 2 in child 15926 for (*) min=0 max=64 smax=64 But that it never said it had initialized worker 1. Adding my own debugging, I can see that 'worker 1' is the one which is being used by the balancer, and that the 'worker 1' hmax was always 0 *if I did not set the 'max=64' attribute.* In other words, it was never inheriting the 'default' hmax. Delving into the mod_balancer code, I found that mod_proxy_balancer.c:77 init_balancer_member(proxy_server_conf *, server_rec *, proxy_balancer *) was running ap_proxy_initialize_worker_share(conf, workers, s); but was not running, as far as I could tell, anything which might initialize all the other fields for the worker. Is this a bug? I noted that mod_proxy.c:1867 and mod_proxy.c:1868 performs a two-step initialization, where it calls ap_proxy_initialize_worker_share and then it calls ap_proxy_initialize_worker(workers, s); Now, I don't know if this lack of a two-step initialization in mod_proxy_balancer is a problem (or if it just a lack of understanding of the code on my part), but I am finding that adding the second initialization call to init_balancer_member seemed to fix the problem for me (the default, ThreadsPerChild based, hmax was picked up for worker 1). Any thoughts? Thank you for your time, Jim Folks, I don't know if my previous followup on this bug was read or not, or if people are already looking at it. Without feedback, I can't tell if I should keep pursuing this, follow up with more details, etc. I can understand somebody telling me "No, we know this isn't the problem" but I'm not hearing anything. Without feedback I'm stuck not knowing if this is a bug or a fatal misunderstanding on my part. Given httpd 2.2.1 and the following settings: Server Version: Apache/2.2.1 (Unix) DAV/2 mod_ssl/2.2.1 OpenSSL/0.9.7a Server Built: Apr 6 2006 00:22:26 Module Magic Number: 20051115:1 Hostname/port: server.mydomain.org:80 Timeouts: connection: 300 keep-alive: 300 MPM Name: Worker MPM Information: Max Daemons: 2 Threaded: yes Forked: yes Server Architecture: 32-bit Server Root: /highwire/server/apache Config File: -c/-C directives Server Built With: -D APACHE_MPM_DIR="server/mpm/worker" -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_SYSVSEM_SERIALIZE -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D HTTPD_ROOT="/usr/local/apache/2.2.1" -D SUEXEC_BIN="/usr/local/apache/2.2.1/bin/suexec" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="conf/mime.types" -D SERVER_CONFIG_FILE="conf/httpd.conf" It appears as though the mod_proxy pool of connections is not being consistently reused when the balancer is used. Given the configuration: ServerLimit 2 ThreadsPerChild 10 <Proxy balancer://tomcat> BalancerMember ajp://server.mydomain.org:8009 route=server.mydomain.org </Proxy> <VirtualHost vhost.mydomain.org:80> ServerName vhost.mydomain.org RewriteEngine on RewriteCond %{ENV:HTTPD_BASE}/logs/myapp.pause -f RewriteRule .* - [forbidden,last] DocumentRoot vhosts/myapp/htdocs RewriteRule ^(/+test(?:/.*)?)$ /myapp$1 [env=internal:yes] RewriteCond %{ENV:internal} =yes RewriteRule ^(/+myapp/.*) $1 [passthrough,last] RewriteRule ^/+myapp/(.*) /$1 [redirect=permanent,last] ProxyPass /myapp balancer://tomcat/myapp stickysession=JSESSIONID nofailover=on ProxyPassReverse / http://vhost.mydomain.org/myapp/ ProxyPassReverseCookiePath /myapp/ / </VirtualHost> My understanding of the documentation http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass is that, with the above ServerLimit and ThreadPerChild limit, we should never see more than 20 connections form between Apache and any one BalancerMember target (2 servers x 10 threads = 20 total threads). On startup, we see zero connections between Apache and Tomcat: ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED" ; I make one request, and see one connection form: ; ab -n 1 http://vhost.mydomain.org/test ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED" tcp 0 0 server.mydomain.org:59165 server.mydomain.org:8009 ESTABLISHED Now what happens if I make 20 concurrent requests? What I would expect to see is up to 20 connections established, and no more. ; ab -c 20 -n 20 http://vhost.mydomain.org/test ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED"|wc -l 21 If I make another 20 connections, I see the connections more than double: ; ab -c 20 -n 20 http://vhost.mydomain.org/test ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED"|wc -l 44 Now look what happens when I plug in min/smax/max settings into the BalancerMember directive. These are settings which I think the documentation indicates are the defaults, given my configuration above: <Proxy balancer://tomcat> BalancerMember ajp://server.mydomain.org:8009 route=server.mydomain.org min=0 smax=10 max=10 </Proxy> After an apache restart, we have no connections, I make a bunch of requests, and see 20 connections, and never more than that. Even when I make 50 concurrent requests, Apache simply queues and processes as it should: ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED"|wc -l 0 ; ab -c 20 -n 20 http://vhost.mydomain.org/test ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED"|wc -l 20 ; ab -c 50 -n 50 http://vhost.mydomain.org/test ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED"|wc -l 20 If we activate the balancer, and the balancer has only one Tomcat target as in this case, shouldn't it should still be obeying the same limits as defined in the configuration. Otherwise we have a situation where the number of proxy connections is unbounded, right? Say Apache is configured to serve no more than 1024 requests at any one moment in time, and we wanted to balance load between two Tomcat servers. What we'd probably do is set up two Tomcat servers which are each configured to accept 1024 requests, but we would expect to see apache balancer send ~512 requests to each Tomcat server. What we would *not* expect to see is Apache opening more connections to a Tomcat server than Apache itself is configured to serve at any one time. In summary, it appears to be a bug to have to declare min/smax/max values which are the same as the defaults per the documentation, without which the number of connections opened appears to be unbounded. My examination of the code led me to believe that the source of the problem is that the BalancerMember worker is not getting its hmax value properly initialized, and that it is therefore bypassing the pool of reusable connections when it is acquiring connections (proxy_util.c:1758, in ap_proxy_acquire_connection). If I add the patch I submitted in comment #14 and remove the min/smax/max directives, I see the behavior I am expecting -- Apache never opens more than 20 connections in all to its Tomcat server. Looking at svn.apache.org, I see that initialization was actually removed in a previous version, though the comments do not explain why it is not appropriate to initialize the balancer worker: http://svn.apache.org/viewcvs.cgi/httpd/httpd/trunk/modules/proxy/proxy_balancer.c?rev=105347&r1=105320&r2=105347 The only reason I can think of to have the current behavior is that somehow the dynamic of balancing between multiple backend servers is expected to be able to handle more connections -- but it just doesn't make sense to me that it would ever be normal for Apache to open more connections to any one backend server than Apache can serve itself. If this isn't considered a bug, I'd very much appreciate it if someone would point out the reason. Thank you for your time, Jim You are correct this is a bug. I checked your patch and it looks fine. I have not committed it yet, because I want to find out the reason, why we are making a copy of each worker we add to a balancer (see http://mail-archives.apache.org/mod_mbox/httpd-dev/200604.mbox/%3c443F81D8.3030401@apache.org%3e). If we would not make a copy of each worker there would be no need for additional initialization. Thank you for the details. My impression has been that the additional worker copies are somehow meant to be virtual, and the intent was to have them resolve to the actual workers. I wasn't actually seeing where that occured in the code though. I'll watch the email thread you refer to with interest! Please let me know if you need any further debugging/testing done. If it turns out that copying the worker is not really the correct thing to do it would require several changes to the code to do this differently. These changes will take some time. As your patch does not run in a wrong direction and we actually have the bug now I committed your patch to the trunk as r394446 (http://svn.apache.org/viewcvs?rev=394446&view=rev). Proposed for backport to 2.2.x as r394653 (http://svn.apache.org/viewcvs?rev=394653&view=rev). Backported to 2.2.x as r396050 (http://svn.apache.org/viewcvs?rev=396050&view=rev). *** Bug 39267 has been marked as a duplicate of this bug. *** |