Bug 38962 - retry timeout not honored when configured in sticky mode
Summary: retry timeout not honored when configured in sticky mode
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy_balancer (show other bugs)
Version: 2.2.0
Hardware: Other AIX
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-03-14 14:02 UTC by Christian BOITEL
Modified: 2007-05-31 05:43 UTC (History)
1 user (show)



Attachments
Unified diff for patch to apply to mod_proxy_balancer.c (2.03 KB, patch)
2006-03-14 14:04 UTC, Christian BOITEL
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Christian BOITEL 2006-03-14 14:02:58 UTC
Configure an Apache to use a balancer to redirect requests say to one tomcat 
but without failover system (noFailover=on). On tomcat, configure it with a 
jvmroute (appended to JSESSIONID) and deploy a sample test.jsp page that 
simply displays "hello world !".

1/ Opens up your favourite browser and call the test jsp page: OK
2/ Stop your tomcat and call the test jsp page: you get the 503 error page and 
your worker is flagged in error state
3/ don't close your browser, call again the page: you still get the 503 page 
but in the error apache log, you get a "All workers are in error state for 
route (xxxxx)" 
4/ don't close your browser (to keep the session cookie)
5/ restart your tomcat and wait at least 90s (default retry timeout for a 
worker is 60s and give a chance to your tomcat to complete its startup)
6/ call your test jsp page with your browser left opened in 4: still get a 503 
error
7/ open a new browser, call the test jsp page: it works !

Pb in in mod_proxy_balancer.c inside find_session_route function: it finds the 
worker associated with the route provided by the client but doesn't call the 
ap_proxy_retry_worker to check if it is time to try the worker again.

Will attach the "diff -u" output as an attachment.
Comment 1 Christian BOITEL 2006-03-14 14:04:35 UTC
Created attachment 17892 [details]
Unified diff for patch to apply to mod_proxy_balancer.c
Comment 2 Ruediger Pluem 2006-06-27 12:08:14 UTC
Committed a slightly modified version to trunk as r417443
(http://svn.apache.org/viewvc?rev=417443&view=rev). Thanks.
Comment 3 Christian BOITEL 2006-08-23 07:06:44 UTC
Is it possible to commit change in 2.2 branch ?

I was expecting the fix to appear in the recent 2.2.3 release.
Comment 4 Ruediger Pluem 2006-08-23 19:32:27 UTC
Thanks for the reminder. Proposed for backport as r434133
(http://svn.apache.org/viewvc?rev=434133&view=rev).
Comment 5 Dale Ogilvie 2007-05-30 20:42:53 UTC
Looks like I'm seeing this bugs as well.

I am running Apache 2.2.3 on RedHat EL 5. I am trying to use Apache to load
balance between two local instances of tomcat in order to utilize the vast
quantities of RAM on our production server.

My httpd setup looks like this:

<Proxy balancer://tomcat>
    BalancerMember ajp://localhost:8009 min=10 max=100 route=tomcat1
loadfactor=1 retry=120
    BalancerMember ajp://localhost:8010 min=10 max=100 route=tomcat2
loadfactor=1 retry=120 </Proxy>

<Location /balancer-manager>
    SetHandler balancer-manager
    Order deny,allow
    Deny from all
    Allow from .trimblecorp.net
</Location>

ProxyPass /dscgi/ds.py/ balancer://tomcat/docushare/dsweb/
stickysession=JSESSIONID nofailover=On ProxyPass /docushare
balancer://tomcat/docushare stickysession=JSESSIONID nofailover=On ProxyPass
/docushare/ balancer://tomcat/docushare/ stickysession=JSESSIONID nofailover=On

The problem is that if one of the workers gets into error status, any client
with a JSESSIONID referencing that route is never able to receive a reply,
Apache *always* responds with a 503 – Temporarily unavailable, *until* another
request is successful. I expected with "retry=120" that after 120 seconds the
client would be able to use the errored out worker, but this is *not* the case.

Test case:

1. Start tomcats
2. Access /docushare, this succeeds and returns a JSESSIONID cookie referencing
the member e.g. JSESSIONID=BC90C156669FDF0194657FF27EC3AF99.tomcat2
3. Stop tomcats to simulate a backend failure 
4. Access /docushare again in the same browser session, this fails with a 503
error (as expected). Balance-manager shows tomcat1 is OK, and tomcat2 is Err
Error_log shows: All workers are in error state for route (tomcat2) 
5. Start tomcats again 
6. Wait for 120+ seconds to allow retry=120 to take effect
7. Access /docushare *using the session with the tomcat2 cookie*, expect
success, get 503 error. I can repeat this step ad nauseam without ever getting a
successful response.
Error_log shows: All workers are in error state for route (tomcat2) 
8. To resolve the issue, delete the JSESSIONID cookie from the client or open up
a new browser and access /docushare. Either of these seem to solve the problem
for the "cookied" browser session.
Comment 6 Ruediger Pluem 2007-05-31 05:43:47 UTC
This should be fixed by the patch mentioned in comment #2. This fix is already
part of httpd 2.2.4. So please apply the patch mentioned above or upgrade to
2.2.4 to resolve your problem.