Configure an Apache to use a balancer to redirect requests say to one tomcat but without failover system (noFailover=on). On tomcat, configure it with a jvmroute (appended to JSESSIONID) and deploy a sample test.jsp page that simply displays "hello world !". 1/ Opens up your favourite browser and call the test jsp page: OK 2/ Stop your tomcat and call the test jsp page: you get the 503 error page and your worker is flagged in error state 3/ don't close your browser, call again the page: you still get the 503 page but in the error apache log, you get a "All workers are in error state for route (xxxxx)" 4/ don't close your browser (to keep the session cookie) 5/ restart your tomcat and wait at least 90s (default retry timeout for a worker is 60s and give a chance to your tomcat to complete its startup) 6/ call your test jsp page with your browser left opened in 4: still get a 503 error 7/ open a new browser, call the test jsp page: it works ! Pb in in mod_proxy_balancer.c inside find_session_route function: it finds the worker associated with the route provided by the client but doesn't call the ap_proxy_retry_worker to check if it is time to try the worker again. Will attach the "diff -u" output as an attachment.
Created attachment 17892 [details] Unified diff for patch to apply to mod_proxy_balancer.c
Committed a slightly modified version to trunk as r417443 (http://svn.apache.org/viewvc?rev=417443&view=rev). Thanks.
Is it possible to commit change in 2.2 branch ? I was expecting the fix to appear in the recent 2.2.3 release.
Thanks for the reminder. Proposed for backport as r434133 (http://svn.apache.org/viewvc?rev=434133&view=rev).
Looks like I'm seeing this bugs as well. I am running Apache 2.2.3 on RedHat EL 5. I am trying to use Apache to load balance between two local instances of tomcat in order to utilize the vast quantities of RAM on our production server. My httpd setup looks like this: <Proxy balancer://tomcat> BalancerMember ajp://localhost:8009 min=10 max=100 route=tomcat1 loadfactor=1 retry=120 BalancerMember ajp://localhost:8010 min=10 max=100 route=tomcat2 loadfactor=1 retry=120 </Proxy> <Location /balancer-manager> SetHandler balancer-manager Order deny,allow Deny from all Allow from .trimblecorp.net </Location> ProxyPass /dscgi/ds.py/ balancer://tomcat/docushare/dsweb/ stickysession=JSESSIONID nofailover=On ProxyPass /docushare balancer://tomcat/docushare stickysession=JSESSIONID nofailover=On ProxyPass /docushare/ balancer://tomcat/docushare/ stickysession=JSESSIONID nofailover=On The problem is that if one of the workers gets into error status, any client with a JSESSIONID referencing that route is never able to receive a reply, Apache *always* responds with a 503 – Temporarily unavailable, *until* another request is successful. I expected with "retry=120" that after 120 seconds the client would be able to use the errored out worker, but this is *not* the case. Test case: 1. Start tomcats 2. Access /docushare, this succeeds and returns a JSESSIONID cookie referencing the member e.g. JSESSIONID=BC90C156669FDF0194657FF27EC3AF99.tomcat2 3. Stop tomcats to simulate a backend failure 4. Access /docushare again in the same browser session, this fails with a 503 error (as expected). Balance-manager shows tomcat1 is OK, and tomcat2 is Err Error_log shows: All workers are in error state for route (tomcat2) 5. Start tomcats again 6. Wait for 120+ seconds to allow retry=120 to take effect 7. Access /docushare *using the session with the tomcat2 cookie*, expect success, get 503 error. I can repeat this step ad nauseam without ever getting a successful response. Error_log shows: All workers are in error state for route (tomcat2) 8. To resolve the issue, delete the JSESSIONID cookie from the client or open up a new browser and access /docushare. Either of these seem to solve the problem for the "cookied" browser session.
This should be fixed by the patch mentioned in comment #2. This fix is already part of httpd 2.2.4. So please apply the patch mentioned above or upgrade to 2.2.4 to resolve your problem.