Bug 38227 - AJP proxy not thread-safe?
Summary: AJP proxy not thread-safe?
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy (show other bugs)
Version: 2.2.0
Hardware: Other Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
: 39267 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-01-11 20:44 UTC by Brad Boyer
Modified: 2006-04-22 08:27 UTC (History)
1 user (show)



Attachments
Debugging output patch referenced comment #9. (1.53 KB, patch)
2006-04-06 06:03 UTC, James A. Robinson
Details | Diff
make mod_proxy_balancer init_balancer_member call ap_proxy_initialize_worker (513 bytes, patch)
2006-04-11 00:40 UTC, James A. Robinson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brad Boyer 2006-01-11 20:44:12 UTC
I had to change from using worker MPM to prefork to avoid connection loss with
the AJP proxy code. I'm using httpd 2.2.0 and tomcat 5.5.12. I get these errors
in the apache httpd error log:

[Wed Jan 11 11:20:33 2006] [error] (70007)The timeout specified has expired:
ajp_ilink_receive() can't receive header
[Wed Jan 11 11:20:33 2006] [error] ajp_read_header: ajp_ilink_receive failed
[Wed Jan 11 11:20:33 2006] [error] (120006)APR does not understand this error
code: proxy: read response failed from (null) (127.0.0.1)

These 3 lines repeat numerous times, and it does not seem to reconnect to tomcat
until reloaded. Changing to prefork appears to eliminate these errors.

This situation occurs after less than an hour of load testing, and is fairly
consistent. With the MPM changed over to prefork, the error has not been seen
over much longer test periods under the same load.

The configuration is an apache httpd doing a ProxyPass to a tomcat running on
the same box. If it matters, the httpd is also using mod_ssl and these requests
are coming in over https.

This is a locally compiled build of 2.2.0, with the only changed code being the
patch that can be found in bug 29744.
Comment 1 Ruediger Pluem 2006-01-11 22:26:19 UTC
In order to make it reconnect please add retry=1 to your ProxyPass directive
(see also parameter retry at
http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass)
Have you checked for the Tomcat logs? The first error seems to indicate that
Tomcat is not responding any longer
Comment 2 Malcolm Amir Hussain-Gambles 2006-02-21 16:57:33 UTC
I have experienced the same bug, the problem also seem to cause the tomcat
server (5.0.28) to run out of threads, I assume the problem is also causing the
ajp threads not to be closed, so I assume using a retry will not help.
Yet to confirm whether using prefork instead of worker fixes the problem, but
I'll try this and add this to the bug report.

ajp_read_header: ajp_ilink_receive failed
(120006)APR does not understand this error code: proxy: read response failed
from (null) (<ip_addr_here>)
Then later 
proxy: AJP: failed to make connection to backend: <ip_addr_here>
(110)Connection timed out: proxy: AJP: attempt to connect to <ip_addr_here>:8009
(<ip_addr_here>) failed

It looks like tomcat can recover if it is under light load and the threads
timeout, but when the load increases tomcat dies as all the threads get used up

Comment 3 Ruediger Pluem 2006-02-21 22:36:09 UTC
Maybe PR#36495 is a similar bug. Could you please provide your proxy config?
Comment 4 Brad Boyer 2006-02-22 00:47:46 UTC
The config in bug 36495 is very similar to my original situation. Apache httpd
2.2 and tomcat runnnig together on a single Linux box with the config on httpd
using ProxyPass to redirect to 127.0.0.1:8009. The messages once the poster
moved to 2.2 are basically the same.

Here's the proxy config I have (with some URL segments changed to xxx):

    ProxyRequests off
    <Proxy *>
        Order allow,deny
        Deny from all
    </Proxy>
    <Proxy ajp://127.0.0.1:8009/*>
        Order deny,allow
        Allow from all
    </Proxy>
    ProxyPass /xxx ajp://127.0.0.1:8009/xxx
Comment 5 Malcolm Amir Hussain-Gambles 2006-02-27 13:00:26 UTC
After further testing this appears to be a problem with a misconfigured tomcat,
the ajp connector behaves differently in worker mode than prefork and requires a
larger number of tomcat ajp threads. Tested this under extreemly heavy load and
all seems to work ok with no problems.
This is all documented in the connector docs, I guess I should have picked this
up earlier.
Comment 6 Ruediger Pluem 2006-02-27 13:49:18 UTC
Thanks for feedback. Could you please add the link to the connector docs where
you found the information that solved your problem?
Comment 7 Malcolm Amir Hussain-Gambles 2006-03-01 11:11:57 UTC
Fixing the tomcat configuration did help, but it still appears that we are
having the same problem where tomcat will run out of threads using ajp mod_proxy
in worker mode.
We cannot reproduce this problem in testing, only on the live systems.
No real errors from apache except this, by which point tomcat has run out of
threads, which I assume why this error appears.
[error] (70007)The timeout specified has expired: ajp_ilink_receive() can't
receive header

We also get this error in prefork mode, but it does not seem to cause tomcat to die
Comment 8 Tino Schwarze 2006-03-23 16:51:44 UTC
I'm seeing similar errors. 

The setup is Apache 2.2.0 with mod_proxy_ajp and Tomcat 5.0.27. I tried
RewriteRules with [P] first, then switched to ProxyPass - same result.

This is the log of the virtual host: (Servernames replaced by $MYSERVER, IPs
replaced by $LOCALIP)
==> httpd_error_log <==
[Thu Mar 23 16:23:09 2006] [error] ajp_read_header: ajp_ilink_receive failed
[Thu Mar 23 16:23:09 2006] [error] (120006)APR does not understand this error
code: proxy: read response failed from (null) (localhost)
                                                                               
                                                                               
                    
This is the global apache log:
==> /opt/httpd/logs/error_log <==
[Thu Mar 23 16:23:09 2006] [error] [client $LOCALIP] proxy: error reading status
line from remote server $MYSERVER referer: $REFERER
[Thu Mar 23 16:23:09 2006] [error] [client $LOCALIP] proxy: Error reading from
remote server returned by $URL, referer: $REFERER
[Thu Mar 23 16:23:09 2006] [error] (70007)The timeout specified has expired:
ajp_ilink_receive() can't receive header


Nothing fancy shows up in catalina.out or Tomcat logs, just a lot of

Mar 23, 2006 4:24:40 PM org.apache.jk.common.ChannelSocket processConnection
INFO: connection timeout reached

but they seem normal and occur all the time.

I do _not_ see this behaviour under heavy load, I see it almost instantly.

Things get worse: Connections from Apache to Tomcat stay open:

[root@server root]# netstat -tn|sed -n -e '1,2p;/8059/p'
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        1      0 127.0.0.1:46802         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46814         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46790         127.0.0.1:8059          CLOSE_WAIT
tcp        0      0 127.0.0.1:46826         127.0.0.1:8059          ESTABLISHED
tcp        1      0 127.0.0.1:46729         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46777         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46753         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46765         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46697         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46604         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46636         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46544         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46551         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46518         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46505         127.0.0.1:8059          CLOSE_WAIT
tcp        1      0 127.0.0.1:46414         127.0.0.1:8059          CLOSE_WAIT
tcp      794      0 127.0.0.1:8059          127.0.0.1:46328         CLOSE_WAIT
tcp        0      0 127.0.0.1:8059          127.0.0.1:46826         ESTABLISHED

There are times when no ESTABLISHED connection is available - it seems to take
quite long for tomcat to recover. Not that the Recv-Q has data in it!

server.xml snippet:

    <Connector protocol="AJP/1.3"
               address="127.0.0.1"
               port="8059"
               minProcessors="10"
               maxProcessors="250"
               maxPostSize="0"
               enableLookups="false"
               useBodyEncodingForURI="true"
               acceptCount="25"
               debug="0"
               redirectPort=""
               connectionTimeout="60000"
    />

(Yes, there are probably some superfluous settings there).

I do not see an increase in thread count or apache child count (using prefork MPM).
Comment 9 James A. Robinson 2006-04-06 06:03:26 UTC
Created attachment 18031 [details]
Debugging output patch referenced comment #9.

I believe I'm seeing the same problem with my build of Apache 2.2.0
connecting to an instance of Tomcat 5.5.10.

THE SETUP

I'm running this on a dual cpu (Intel(R) Xeon(TM) CPU 2.80GHz) machine
running Linux 2.4.21-15.ELsmp.	Apache server-info reveals:

  -D APACHE_MPM_DIR="server/mpm/worker"
  -D APR_HAS_SENDFILE -D APR_HAS_MMAP
  -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
  -D APR_USE_SYSVSEM_SERIALIZE
  -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
  -D APR_HAS_OTHER_CHILD
  -D AP_HAVE_RELIABLE_PIPED_LOGS
  -D HTTPD_ROOT="/usr/local/apache/2.2.0"
  -D SUEXEC_BIN="/usr/local/apache/2.2.0/bin/suexec"
  -D DEFAULT_ERRORLOG="logs/error_log"
  -D AP_TYPES_CONFIG_FILE="conf/mime.types"
  -D SERVER_CONFIG_FILE="conf/httpd.conf"

I'm using mod_proxy to balance incoming requests to an Apache
VirtualHost to a mounted Tomcat service via AJP.  I think I've
included the important Apache configuration below:

  <Proxy balancer://tomcat>
    BalancerMember ajp://server.mydomain.org:8009 route=server.mydomain.org
  </Proxy>

  <VirtualHost vhost.mydomain.org:80>
    RewriteEngine on
    RewriteCond %{ENV:HTTPD_BASE}/logs/myapp.pause -f
    RewriteRule .* - [forbidden,last]
    RewriteRule ^(/+test(?:/.*)?)$ /myapp$1 [env=internal:yes]
    RewriteCond %{ENV:internal} =yes
    RewriteRule ^(/+myapp/.*) $1 [passthrough,last]
    RewriteRule ^/+myapp/(.*) /$1 [redirect=permanent,last]

    ProxyPass /myapp balancer://tomcat/myapp stickysession=JSESSIONID
nofailover=on
    ProxyPassReverse / http://vhost.mydomain.org/myapp/
    ProxyPassReverseCookiePath /myapp/ /
  </VirtualHost>

My Tomcat Connector in server.xml is straightforward:

    <Connector
	     protocol="AJP/1.3"
	      address="${catalina.hostname}"
		 port="8009"
	enableLookups="false"		     />

So a user connects to

  http://vhost.mydomain.org/test

and then mod_proxy will forward the request via AJP to

  http://server.mydomain.org:8009/myapp/test

The 'test' servlet is a simple one.  It computes a random number been
0 and 1000, and sleeps for that many milliseconds, before printing a
one line response to the client.


THE PROBLEM

Using this stripped down configuration, what I see when I use ab(1) to
load test the service, is that Tomcat is forced to keep creating new
threads to service incoming connections, and it leaves them in the
Keepalive state after ab(1) has finished.  Tomcat also indicates it is
*using* those threads (perhaps polling on them?)

Every time I make a new request via mod_proxy, a new Thread is started
in Tomcat.  However, when I query Tomcat directly, I see a thread get
created, the request gets serviced, and the thread is switch to the
Ready steady, indicating it is now able to to accept new requests.

If I run

  netstat -t | egrep "^tcp.*:8009"

on the server, I see a constantly growing number of connections
between Apache and Tomcat.  When I query Tomcat directly, I do not see
this growing pool of connections.

This lead me to believe the problem was mod_proxy keeping connections
open (perhaps the backend pooling code), but failing to either reuse
them, or perhaps failing to properly close them on cleanup.


THE DEBUGGING


First, I added some debugging to proxy_util.c.	I've attached it to
this comment.  Basically all I did was print out a stupid log message
when it first initializes, and then each time a connections is
acquired.

Now, starting up Apache I see this in the log:

[Wed Apr 05 21:09:35 2006] [error] 0: worker->hmax: 25, worker->cp->res: not
null
[Wed Apr 05 21:09:35 2006] [debug] proxy_util.c(1666): proxy: initialized
worker 0 in child 16610 for (server.mydomain.org) min=0 max=25 smax=25
[Wed Apr 05 21:09:35 2006] [error] 0: worker->hmax: 25, worker->cp->res: not
null
[Wed Apr 05 21:09:35 2006] [debug] proxy_util.c(1666): proxy: initialized
worker 2 in child 16610 for (*) min=0 max=25 smax=25

For my test I first see if any connections exist to Tomcat (there
should not be any):

 ; netstat -t | egrep "^tcp.*:8009" | tr -s " "

As I expect, no connections exist.  Next, on a different machine, I
make a single request to my virtual host:

  ab -n 1  http://vhost.mydomain.org/test

It completes, and tells me it was successful.  I see a bunch of
debugging in the log indicating mod_proxy is working.  In the chatter,
I see my own debugging lines:

[Wed Apr 05 21:09:43 2006] [error] 1: worker->hmax: 0, worker->cp->res: not
null
[Wed Apr 05 21:09:43 2006] [error] 2: called connection_constructor

This surprises me for two reasons.  First of all, I was expecting
worker->hmax to be 25, not 0.  The second reason is that I expected to
see that worker->cp->res have a value, not to be null.

[As an aside, subsequent testing showed that if I added 'max=25' to
the BalanceMember configuration directive, worker->hmax would be 25 in
requests; the null value for worker->cp->res did not change.]

Now, I check on my server to see if any connections exist (I expect a
connection, due to the connection pooling):

; netstat -t | egrep "^tcp.*:8009" | tr -s " "
tcp 0 0 server.mydomain.org:8009 server.mydomain.org:39642 ESTABLISHED
tcp 0 0 server.mydomain.org:39642 server.mydomain.org:8009 ESTABLISHED


Now, if I run a second request, I would have expected mod_proxy_ajp to
reuse the existing connection.	But look what happens after I ran my
ab command a second time:

[Wed Apr 05 21:09:46 2006] [error] 1: worker->hmax: 0, worker->cp->res: null
[Wed Apr 05 21:09:46 2006] [error] 2: called connection_constructor

And netstat shows two *new* connections, for a total of four:

  ; netstat -t | egrep "^tcp.*:8009" | tr -s " "
  tcp 0 0 server.mydomain.org:8009 server.mydomain.org:39753 ESTABLISHED
  tcp 0 0 server.mydomain.org:8009 server.mydomain.org:39642 ESTABLISHED
  tcp 0 0 server.mydomain.org:39753 server.mydomain.org:8009 ESTABLISHED
  tcp 0 0 server.mydomain.org:39642 server.mydomain.org:8009 ESTABLISHED

I tried making a number of connections, and watched the connections
grow until it filled all available threads on my Tomcat container
(which was set to the default maxThreads of 200).  First, I reset
everything:

  ; apachectl stop
  ; catalina.sh stop

  ; apachectl start
  ; catalina.sh start

  ; netstat -t | egrep "^tcp.*:8009" | tr -s " "|wc -l
	0

And then on my test client I make 200 connections:

   ab -n 200 http://vhost.mydomain.org/test

and then my server shows over 400 established connections:

  ; netstat -t | egrep "^tcp.*:8009" | tr -s " "|grep -c ESTABLISHED
  466

Ack!

If I tweak the proxy code, setting one of these the 'close'
flags to true:

  conn->close_on_recycle = 1;
  conn->close = 1;

Then Apache cleans itself up, and Tomcat does not get overwhelmed.

What I'm unable to determine so far is if this is a problem with
mod_proxy(_(balance|ajp))?.c, a problem with Tomcat + Apache, a
problem with the APR utilities, etc.  :(

Any insights from folks would be much appreciated!

Apologies for the long post...


Jim
Comment 10 Ruediger Pluem 2006-04-06 07:47:03 UTC
Thank you very much for your detailed post. I suspect a thing that has been
fixed meanwhile. Could you please give
http://httpd.apache.org/dev/dist/httpd-2.2.1.tar.gz a try? Keep in mind that
this is NO official release of httpd-2.2.1, but one that the developers are
currently deciding on whether to release it or not. It is known that this
tarball currently contains a bug that prevents using SSL backends with the
proxy, but this does not harm your current problem. It would be very nice if you
could give it a try and let me know the results.

Another thing regarding your astonishments regarding the reusing of connections.
Keep in mind that the pool limit is PER httpd process. So the maximum number of
connections is not what you set via max, but <max> * <maximum number of httpd
processes>. You have not posted your MPM settings here, so if you allow httpd to
create 8 processes, 200 connections to the backend are ok.

> And netstat shows two *new* connections, for a total of four:
> 
>  ; netstat -t | egrep "^tcp.*:8009" | tr -s " "
>  tcp 0 0 server.mydomain.org:8009 server.mydomain.org:39753 ESTABLISHED
>  tcp 0 0 server.mydomain.org:8009 server.mydomain.org:39642 ESTABLISHED
>  tcp 0 0 server.mydomain.org:39753 server.mydomain.org:8009 ESTABLISHED
>  tcp 0 0 server.mydomain.org:39642 server.mydomain.org:8009 ESTABLISHED

Actually there is only *one* more connection, but as httpd and Tomcat are
running on the same server you see *both* ends of the tcp connection in netstat.
If you are running httpd and tomcat on the same server you have to divide your
netstat results by 2.
Comment 11 James A. Robinson 2006-04-06 08:15:20 UTC
(In reply to comment #10)
> Thank you very much for your detailed post. I suspect a thing that has been
> fixed meanwhile. Could you please give
> http://httpd.apache.org/dev/dist/httpd-2.2.1.tar.gz a try?

I certainly would be happy to!  I was about to post that I had added
some more debugging and discovered that mod_proxy was emitting
info that it had initialized worker->id 0 and worked->id 2, but not
worker->id 1, and that mod_proxy_ajp was being handed worker->id 1. Dunno
if that was the problem being resolved in the fix. 

I will try 2.2.1 right now.  It'll take me a few minutes to compile and
test, but I'll post back here within the hour.

> Actually there is only *one* more connection, but as httpd and Tomcat are
> running on the same server you see *both* ends of the tcp connection in

Sorry, reading my original post shows I put too much stress on the high
number of connections.  I was in fact realizing the two lines were from
the same connection, I was just surprised that new sockets kept getting
added instead of old ones being reused.
Comment 12 James A. Robinson 2006-04-06 08:54:15 UTC
(In reply to comment #10)
> I suspect a thing that has been fixed meanwhile. Could you please give
> http://httpd.apache.org/dev/dist/httpd-2.2.1.tar.gz a try? Keep in mind that
> this is NO official release of httpd-2.2.1, but one that the developers are
> currently deciding on whether to release it or not. It is known that this
> tarball currently contains a bug that prevents using SSL backends with the
> proxy, but this does not harm your current problem. It would be very nice if 
> you could give it a try and let me know the results.

Hi,

I tried out 2.2.1, and I'm now seeing the kind of connection growth I
expected.

I slowly increased the concurrent requests (from the original *1* thread
I was using in the tests I originally posted about), to tens of threads,
without seeing an undue increase in the number of used connections.

I note that Tomcat does still eventually reach 200 active threads when
I hit it Apache with many requests, but it appears as though Apache is
now reusing existing connections.

From this first pass, it looks to me like 2.2.1 fixes the problem I was
having. Thank you very much for your help!
Comment 13 Ruediger Pluem 2006-04-06 12:40:50 UTC
(In reply to comment #12)

> 
> I note that Tomcat does still eventually reach 200 active threads when
> I hit it Apache with many requests, but it appears as though Apache is

As mentioned, this possibly works as designed. This depends on your MPM settings.

> 
> From this first pass, it looks to me like 2.2.1 fixes the problem I was

Sounds good. Please let me know the results if you do more intense tests.

Comment 14 James A. Robinson 2006-04-11 00:40:23 UTC
Created attachment 18055 [details]
make mod_proxy_balancer init_balancer_member call ap_proxy_initialize_worker

(In reply to comment #13)
>
> As mentioned, this possibly works as designed. This depends on your MPM
settings.
> 
> > 
> > From this first pass, it looks to me like 2.2.1 fixes the problem I was
> 
> Sounds good. Please let me know the results if you do more intense tests.

Hi,

I've done some more testing, and I *think* I may have found a bug with
the load balancer.  I am running httpd 2.2.1, with server/thread
limits of

  StartServers 8
  ServerLimit 16
  ThreadsPerChild 64

I allowed for 1024 threads on the Tomcat side.	The proxy setup is the
same as I mention in comment #9.

When I was initially load testing the 2.2.1 server, I happened to be
setting the max attributes on the balancer:

  <Proxy balancer://tomcat>
    BalancerMember ajp://server.mydomain.org:8009 route=server.mydomain.org
max=64
  </Proxy>

This seemed rock solid, I threw batches of 500+ concurrent requests
from our Sun Fire T2000 to this linux server, and it dealt with the
load.

However, I then tried removing the 'max' attribute, and ran into some
problem.  I found that, without the hard-coded the max attribute, I
could reliably lock the system up when throwing 500 concurrent
requests at it.

My understanding from previous comments and the notes is that, if one
does not specify additional attributes, the defaults should be to use
the ThreadsPerChild to set s/max:

  smin=0  smax=64 max=64

Setting LogLevel to debug I saw that mod_proxy code says

[debug] proxy_util.c(1690): proxy: initialized worker 0 in child 15926 for
(argo02.highwire.org) min=0 max=64 smax=64
[debug] proxy_util.c(1690): proxy: initialized worker 2 in child 15926 for (*)
min=0 max=64 smax=64

But that it never said it had initialized worker 1.  Adding my own
debugging, I can see that 'worker 1' is the one which is being used by
the balancer, and that the 'worker 1' hmax was always 0 *if I did not
set the 'max=64' attribute.*  In other words, it was never inheriting
the 'default' hmax.

Delving into the mod_balancer code, I found that

  mod_proxy_balancer.c:77
  init_balancer_member(proxy_server_conf *, server_rec *, proxy_balancer *)

was running

  ap_proxy_initialize_worker_share(conf, workers, s);

but was not running, as far as I could tell, anything which might
initialize all the other fields for the worker.  Is this a bug?

I noted that mod_proxy.c:1867 and mod_proxy.c:1868 performs a two-step
initialization, where it calls ap_proxy_initialize_worker_share and
then it calls

 ap_proxy_initialize_worker(workers, s);

Now, I don't know if this lack of a two-step initialization in
mod_proxy_balancer is a problem (or if it just a lack of understanding
of the code on my part), but I am finding that adding the second
initialization call to init_balancer_member seemed to fix the problem
for me (the default, ThreadsPerChild based, hmax was picked up for
worker 1).

Any thoughts?

Thank you for your time,

Jim
Comment 15 James A. Robinson 2006-04-13 15:54:30 UTC
Folks,

I don't know if my previous followup on this bug was read or not, or
if people are already looking at it.  Without feedback, I can't tell
if I should keep pursuing this, follow up with more details, etc.

I can understand somebody telling me "No, we know this isn't the
problem" but I'm not hearing anything.  Without feedback I'm stuck not
knowing if this is a bug or a fatal misunderstanding on my part.

Given httpd 2.2.1 and the following settings:

  Server Version: Apache/2.2.1 (Unix) DAV/2 mod_ssl/2.2.1 OpenSSL/0.9.7a
  Server Built: Apr 6 2006 00:22:26
  Module Magic Number: 20051115:1
  Hostname/port: server.mydomain.org:80
  Timeouts: connection: 300    keep-alive: 300
  MPM Name: Worker
  MPM Information: Max Daemons: 2 Threaded: yes Forked: yes
  Server Architecture: 32-bit
  Server Root: /highwire/server/apache
  Config File: -c/-C directives
  Server Built With:
    -D APACHE_MPM_DIR="server/mpm/worker"
    -D APR_HAS_SENDFILE
    -D APR_HAS_MMAP
    -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
    -D APR_USE_SYSVSEM_SERIALIZE
    -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
    -D APR_HAS_OTHER_CHILD
    -D AP_HAVE_RELIABLE_PIPED_LOGS
    -D HTTPD_ROOT="/usr/local/apache/2.2.1"
    -D SUEXEC_BIN="/usr/local/apache/2.2.1/bin/suexec"
    -D DEFAULT_ERRORLOG="logs/error_log"
    -D AP_TYPES_CONFIG_FILE="conf/mime.types"
    -D SERVER_CONFIG_FILE="conf/httpd.conf"

It appears as though the mod_proxy pool of connections is not being
consistently reused when the balancer is used.  Given the
configuration:

  ServerLimit 2
  ThreadsPerChild 10

  <Proxy balancer://tomcat>
    BalancerMember ajp://server.mydomain.org:8009 route=server.mydomain.org
  </Proxy>

  <VirtualHost  vhost.mydomain.org:80>
    ServerName  vhost.mydomain.org

    RewriteEngine on

    RewriteCond %{ENV:HTTPD_BASE}/logs/myapp.pause -f
    RewriteRule .* - [forbidden,last]

    DocumentRoot vhosts/myapp/htdocs

    RewriteRule ^(/+test(?:/.*)?)$ /myapp$1 [env=internal:yes]

    RewriteCond %{ENV:internal} =yes
    RewriteRule ^(/+myapp/.*)  $1 [passthrough,last]
    RewriteRule ^/+myapp/(.*) /$1 [redirect=permanent,last]

    ProxyPass /myapp balancer://tomcat/myapp stickysession=JSESSIONID nofailover=on
    ProxyPassReverse / http://vhost.mydomain.org/myapp/
    ProxyPassReverseCookiePath /myapp/ /
  </VirtualHost>

My understanding of the documentation

  http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass

is that, with the above ServerLimit and ThreadPerChild limit, we
should never see more than 20 connections form between Apache and any
one BalancerMember target (2 servers x 10 threads = 20 total threads).

On startup, we see zero connections between Apache and Tomcat:

  ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED"
  ;

I make one request, and see one connection form:

  ; ab -n 1 http://vhost.mydomain.org/test
  ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED"
  tcp 0 0 server.mydomain.org:59165 server.mydomain.org:8009 ESTABLISHED

Now what happens if I make 20 concurrent requests? What I would expect to
see is up to 20 connections established, and no more.

  ; ab -c 20 -n 20 http://vhost.mydomain.org/test
  ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED"|wc -l
       21

If I make another 20 connections, I see the connections more than
double:

  ; ab -c 20 -n 20 http://vhost.mydomain.org/test
  ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED"|wc -l
       44

Now look what happens when I plug in min/smax/max settings into the
BalancerMember directive.  These are settings which I think the
documentation indicates are the defaults, given my configuration
above:

  <Proxy balancer://tomcat>
    BalancerMember ajp://server.mydomain.org:8009 route=server.mydomain.org
min=0 smax=10 max=10
  </Proxy>

After an apache restart, we have no connections, I make a bunch of
requests, and see 20 connections, and never more than that.  Even when
I make 50 concurrent requests, Apache simply queues and processes as
it should:

  ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED"|wc -l
        0

  ; ab -c 20 -n 20 http://vhost.mydomain.org/test
  ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED"|wc -l
       20

  ; ab -c 50 -n 50 http://vhost.mydomain.org/test
  ; netstat -t | tr -s " " | egrep "^tcp.*:8009 ESTABLISHED"|wc -l
       20

If we activate the balancer, and the balancer has only one Tomcat
target as in this case, shouldn't it should still be obeying the same
limits as defined in the configuration.  Otherwise we have a situation
where the number of proxy connections is unbounded, right?

Say Apache is configured to serve no more than 1024 requests at any
one moment in time, and we wanted to balance load between two Tomcat
servers.  What we'd probably do is set up two Tomcat servers which are
each configured to accept 1024 requests, but we would expect to see
apache balancer send ~512 requests to each Tomcat server.  What we
would *not* expect to see is Apache opening more connections to a
Tomcat server than Apache itself is configured to serve at any one
time.

In summary, it appears to be a bug to have to declare min/smax/max
values which are the same as the defaults per the documentation,
without which the number of connections opened appears to be
unbounded.

My examination of the code led me to believe that the source of the
problem is that the BalancerMember worker is not getting its hmax
value properly initialized, and that it is therefore bypassing the
pool of reusable connections when it is acquiring connections
(proxy_util.c:1758, in ap_proxy_acquire_connection).

If I add the patch I submitted in comment #14 and remove the
min/smax/max directives, I see the behavior I am expecting -- Apache
never opens more than 20 connections in all to its Tomcat server.

Looking at svn.apache.org, I see that initialization was actually
removed in a previous version, though the comments do not explain why
it is not appropriate to initialize the balancer worker:

 
http://svn.apache.org/viewcvs.cgi/httpd/httpd/trunk/modules/proxy/proxy_balancer.c?rev=105347&r1=105320&r2=105347

The only reason I can think of to have the current behavior is that
somehow the dynamic of balancing between multiple backend servers is
expected to be able to handle more connections -- but it just doesn't
make sense to me that it would ever be normal for Apache to open more
connections to any one backend server than Apache can serve itself.

If this isn't considered a bug, I'd very much appreciate it if someone
would point out the reason.

Thank you for your time,


Jim
Comment 16 Ruediger Pluem 2006-04-14 11:43:54 UTC
You are correct this is a bug. I checked your patch and it looks fine. I have
not committed it yet, because I want to find out the reason, why we are making a
copy of each worker we add to a balancer (see
http://mail-archives.apache.org/mod_mbox/httpd-dev/200604.mbox/%3c443F81D8.3030401@apache.org%3e).
If we would not make a copy of each worker there would be no need for additional
initialization.
Comment 17 James A. Robinson 2006-04-14 15:35:13 UTC
Thank you for the details. My impression has been that the additional
worker copies are somehow meant to be virtual, and the intent was to
have them resolve to the actual workers.  I wasn't actually seeing
where that occured in the code though.

I'll watch the email thread you refer to with interest! Please let me
know if you need any further debugging/testing done.
Comment 18 Ruediger Pluem 2006-04-16 09:16:34 UTC
If it turns out that copying the worker is not really the correct thing to do it
would require several changes to the code to do this differently. These changes
will take some time. As your patch does not run in a wrong direction and we
actually have the bug now I committed your patch to the trunk as r394446
(http://svn.apache.org/viewcvs?rev=394446&view=rev).
Comment 19 Ruediger Pluem 2006-04-17 10:43:41 UTC
Proposed for backport to 2.2.x as r394653
(http://svn.apache.org/viewcvs?rev=394653&view=rev).
Comment 20 Ruediger Pluem 2006-04-22 15:26:35 UTC
Backported to 2.2.x as r396050 (http://svn.apache.org/viewcvs?rev=396050&view=rev).
Comment 21 Ruediger Pluem 2006-04-22 15:27:27 UTC
*** Bug 39267 has been marked as a duplicate of this bug. ***