I have in my config: <Virtualhost 10.10.11.108:443> ServerName test DocumentRoot "/var/www/test/" ErrorDocument 500 "/error/50x.php" ErrorDocument 502 "/error/50x.php" ErrorDocument 503 "/error/50x.php" ErrorDocument 403 "/error/403.html" ErrorDocument 404 "/error/404.html" ProxyPass /error/ ! ProxyPass /lb/ ! RewriteEngine On RewriteRule ^/$ /test/ [R] <Location /lb> SetHandler balancer-manager </Location> ProxyTimeout 3000 KeepAlive Off ProxyPass / balancer://test/ stickysession=JSESSIONID nofailover=On ProxyPassReverse / http://10.10.20.51:8080/ ProxyPassReverse / http://10.10.21.51:8080/ <Proxy balancer://dashboard/> BalancerMember http://10.10.20.51:8080 route=node1 retry=0 BalancerMember http://10.10.21.51:8080 route=node2 retry=0 </Proxy> ProxyPreserveHost On ProxyErrorOverride On SSLEngine on SSLCertificateFile /etc/httpd/certs/_.localhost.com.crt SSLCertificateKeyFile /etc/httpd/certs/_.localhost.com.key SSLCertificateChainFile /etc/httpd/certs/some_intermediate.crt # LogLevel debug ErrorLog /var/log/httpd/test.error.log CustomLog /var/log/httpd/test.access.log combined </Virtualhost> When apache 2.2.8 starts everything works fine and if I go to https://10.10.11.108/lb/ I see something like this: StickySession Timeout FailoverAttempts Method JSESSIONID 0 1 byrequests Worker URL Route RouteRedir Factor Set Status Elected To From http://10.10.20.51:8080 node1 1 0 Ok 19013 19M 143M http://10.10.21.51:8080 node2 1 0 Ok 1602 752K 5.1M However sometimes if i modify other virtualhosts and do: # apachectl graceful Load balancer looses stickyness, and if i go to https://10.10.11.108/lb/ I see something like this: StickySession Timeout FailoverAttempts Method JSESSIONID 0 1 byrequests Worker URL Route RouteRedir Factor Set Status Elected To From http://10.10.20.51:8080 0 0 Ok 19013 19M 143M http://10.10.21.51:8080 node1 1 0 Ok 1602 752K 5.1M If you do: /etc/init.d/httpd restart it works fine.
Please try with 2.2.9
Tried with 2.2.9 still not fixed. I believe my report is a duplicate of this: https://issues.apache.org/bugzilla/show_bug.cgi?id=42621 For know my "simple solution" is perl wrapper around apache reload. Basically do apachectl graceful, read configs from a perl script and go to balance-manager web page to fix the routes.
*** Bug 42621 has been marked as a duplicate of this bug. ***
Created attachment 26555 [details] keepalive.py - Python script to keep an apache process alive indefinitely by using the keepalive issue
I have seen the exact same problem with mod_proxy_balancer losing its routes when you do an apachectl graceful. Here is my relevant config: ProxyPass /balancer-manager ! <Proxy balancer://webmail> BalancerMember http://boreas.sp:80 route=boreas loadfactor=1 BalancerMember http://chinook.sp:80 route=chinook loadfactor=1 BalancerMember http://zephyrus.sp:80 route=zephyrus loadfactor=1 ProxySet lbmethod=byrequests </Proxy> ProxyPass / balancer://webmail/ stickysession=WEBMAILID
I have seen the exact same problem with mod_proxy_balancer losing its routes when you do an apachectl graceful. Here is my relevant config: ProxyPass /balancer-manager ! <Proxy balancer://webmail> BalancerMember http://boreas.sp:80 route=boreas loadfactor=1 BalancerMember http://chinook.sp:80 route=chinook loadfactor=1 BalancerMember http://zephyrus.sp:80 route=zephyrus loadfactor=1 ProxySet lbmethod=byrequests </Proxy> ProxyPass / balancer://webmail/ stickysession=WEBMAILID Oh, here is the server info (Server runs as a VM in ESX): FreeBSD kottke.kinetic.more.net 7.3-RELEASE-p2 FreeBSD 7.3-RELEASE-p2 #0: Mon Jul 12 19:23:19 UTC 2010 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 Server version: Apache/2.2.16 (FreeBSD) Server built: Aug 16 2010 15:38:53
WONTFIX - 2.2.x does not guarantee local changes via balancer-manager are kept
The comment on the WONTFIX above suggests that the issue has been misinterpreted. The issue is not that local changes via balancer-manager are being lost, this behaviour is understood. The issue is that by making change to the configuration file (no changes to balancer manager), and then doing a graceful restart, the routes can become offset in a way that no longer matches the configured servers. For example, if I have this: <Proxy balancer://balancer1/> BalancerMember http://host1 route=host1 BalancerMember http://host2 route=host2 BalancerMember http://host3 route=host3 </Proxy> ... and I comment out host1, then do a graceful restart, I may end up with Apache behaving like this ... http://host2 route=host1 http://host3 route=host2 Some observed notes: + Problems appear when new balancer blocks are added or removed, or balancer members are added or removed. + The route names can even cross from one balancer configuration block to another! + It doesn't always happen, we see it on active systems. + A subsequent graceful restart usually fixes the routes. + When it does happen, it happens consistently - we have multiple httpd servers polling Subversion for configuration changes, then checking them out and doing a graceful restart. In this situation, all of our httpd servers tend to develop the same route problems at the same time. This issue has various impacts on applications. In the worst case we saw it cause all traffic for a balancer to be directed to a single balancer member, which couldn't handle the load.
Same here on a busy system changing between one active+hot spare backend servers configuration to a sticky load balanced setup activated by a reload causes node1 and node2 sessions to be routed to node1 first, but after a restart correctly to seperate nodes, which means half of the users loose their session info. Observed workaround is always to do a restart.
Reproduced on OEL 5 with apache httpd 2.2.17. I'll try to reproduce on 2.2.22.
*** Bug 45950 has been marked as a duplicate of this bug. ***
I have been able to reproduce this on 2.2.22
Can you check w/ 2.4.x and/or trunk?
Problem still exists with 2.2.24. I have no idea about 2.4, though. That would be major change in our environment, which we don't intent to do.
Hi there, Can this bug be fixed in the 2.2 branch?? I'm using RHEL 6 and it would be a huge and risky move to migrate to apache 2.4. I have more than 100 virtual hosts in my Apache Reverse Proxy, some of them are balanced with mod_proxy_balancer to backend servers (some are JBoss AJP and some others are Apache/PHP) but the vast majority are simple ProxyPass directives. The annoying thing with this bug is that sometimes I need to do little adjustments to the config (eg. add a new ProxyPass directive in an existing vhost), most of the time in vhosts not using the mod_proxy_balancer feature and when I do a graceful for the changes to take effect, sometimes the vhosts using mod_proxy_balancer forget the routes and suddenly all of the clients are redirected to the first server making half of the users lose their sessions and overloading one of the server while the other is idle. Right now I'm using Apache 2.2.27 and the problem still happens.
Yes, no one seems to care. Almost like using Oracle software. Is at least someone reading this?
>Yes, no one seems to care. Almost like using Oracle software. >Is at least someone reading this? Ha ha ha! The 2.2 answer was given previously: WONTFIX - 2.2.x does not guarantee local changes via balancer-manager are kept If you want to be constructive: Can someone that needs it on the older release volunteer to confirm that it is resolved on 2.4, as requested some time ago?
(In reply to Jeff Trawick from comment #17) > >Yes, no one seems to care. Almost like using Oracle software. > >Is at least someone reading this? > > Ha ha ha! > > The 2.2 answer was given previously: WONTFIX - 2.2.x does not guarantee > local changes via balancer-manager are kept > > If you want to be constructive: > > Can someone that needs it on the older release volunteer to confirm that it > is resolved on 2.4, as requested some time ago? Thanks Jeff for the answer but that WONTFIX does not apply here. It is obvious (for me at least) that changes done through the balancer-manager page won't be kept between restarts. That's totally understandable. The thing here is that BalancerMember configuration gets mangled somehow when you issue a graceful. It sometimes swaps the route name or the loadfactor changes to 0 when the configuration file says loadfactor=1. The end result of this is that all requests are now sent to one server only. My workaround here is to always do a restart instead of a graceful but this is not really a good idea when hundreds of vhosts are going through this reverse proxy (the active requests will be lost abruptly). Unfortunately no one using 2.2 here is able to confirm that 2.4 fixes the problem. That at least would be an incentive to take the leap.
>The thing here is that BalancerMember configuration gets mangled >somehow when you issue a graceful. It sometimes swaps the >route name or the loadfactor changes to 0 when the configuration >file says loadfactor=1. The end result of this is that all >requests are now sent to one server only. Thanks for reiterating that. (I recall that shared memory setup is different for graceful restart vs. hard restart, but I haven't looked at this particular bug.)
We've tested 2.4 in our test environment and couldn't reproduce the reported behavior. Unfortunately it's not possible to update our production environment to 2.4 to validate this with real day by day usage.
(In reply to William Lovaton from comment #18) > (In reply to Jeff Trawick from comment #17) > > If you want to be constructive: > > > > Can someone that needs it on the older release volunteer to confirm that it > > is resolved on 2.4, as requested some time ago? > > Thanks Jeff for the answer but that WONTFIX does not apply here. It is > obvious (for me at least) that changes done through the balancer-manager > page won't be kept between restarts. That's totally understandable. > > The thing here is that BalancerMember configuration gets mangled somehow > when you issue a graceful. It sometimes swaps the route name or the > loadfactor changes to 0 when the configuration file says loadfactor=1. The > end result of this is that all requests are now sent to one server only. > > My workaround here is to always do a restart instead of a graceful but this > is not really a good idea when hundreds of vhosts are going through this > reverse proxy (the active requests will be lost abruptly). That's exactly the problem. For a restart I would have to take one server out of the load balancing, wait until sessions have finished, restart it, move on to the next server, and so on. I just can't do that after every small config change. > > Unfortunately no one using 2.2 here is able to confirm that 2.4 fixes the > problem. That at least would be an incentive to take the leap. I can't reproduce in our dev environment, because we just don't have multiple backend servers there. I would need to move on production server to 2.4 and rewrite all the vhost configs. If this would be an option, I would have moved everything to 2.4 already. If this won't be fixed, we just stick with mod_jk. It's buggy too, and handling of the configuration files sucks, but that's the only alternative.
Created attachment 32086 [details] proposed patch Match the shared memory with workers according to their names.
Created attachment 32087 [details] proposed patch v2 Match the shared memory with workers according to their names.
Jan, thanks for the patch which seems to work. Maybe we could use the apr_md5() of the worker's name to save space in SHM?
That's good idea. It would also fix the problem of >255 bytes long worker names. Expect the v3 soon.
Created attachment 32091 [details] proposed patch v3 Match the shared memory with workers according to MD5 hash of their names.
Thanks, looks good. Isn't the ap_get_scoreboard_lb() call in init_balancer_members() also concerned by this? Detail: apr_md5() does all the init/update/final in a one go
(In reply to Yann Ylavic from comment #27) > Thanks, looks good. > > Isn't the ap_get_scoreboard_lb() call in init_balancer_members() also > concerned by this? Not sure I understand. I change that ap_get_scoreboard_lb to ap_proxy_get_scoreboard_lb. > Detail: apr_md5() does all the init/update/final in a one go Done in next attached patch. Thanks.
Created attachment 32093 [details] proposed patch v4 Match the shared memory with workers according to MD5 hash of their names. Now just with apr_md5().
(In reply to jkaluza from comment #28) > (In reply to Yann Ylavic from comment #27) > > Isn't the ap_get_scoreboard_lb() call in init_balancer_members() also > > concerned by this? > > Not sure I understand. I change that ap_get_scoreboard_lb to > ap_proxy_get_scoreboard_lb. I'm not sure either, but I think this should be done since the purpose is to determine whether the worker has already been initialized based on the status in SHM this time, so that lb parameters are not reset spuriously (below).
I already have it in my patches (or am I missing something)? - slot = (proxy_worker_stat *) ap_get_scoreboard_lb(workers->id); + slot = (proxy_worker_stat *) ap_proxy_get_scoreboard_lb(workers);
(In reply to jkaluza from comment #31) > I already have it in my patches (or am I missing something)? > > - slot = (proxy_worker_stat *) ap_get_scoreboard_lb(workers->id); > + slot = (proxy_worker_stat *) > ap_proxy_get_scoreboard_lb(workers); Sorry, it was based on your previous comment, and I didn't update the page to see the latest patch. Maybe you can add a fast path in ap_proxy_get_scoreboard_lb() with something like : +void *ap_proxy_set_scoreboard_lb(proxy_worker *worker) { + int i = 0; + proxy_worker_stat *free_slot = NULL; + proxy_worker_stat *s; + unsigned char digest[APR_MD5_DIGESTSIZE]; + + if (!ap_scoreboard_image) { + return NULL; + } if (worker->s) { return worker->s; } + + apr_md5(digest, (const unsigned char *) worker->name, + strlen(worker->name)); + + /* Try to find out the right shared memory according to the hash + * of worker->name. */ + while ((s = (proxy_worker_stat *)ap_get_scoreboard_lb(i++)) != NULL) { + if (memcmp(s->digest, digest, APR_MD5_DIGESTSIZE) == 0) { worker->s = s; + return s; + } + else if (s->status == 0 && free_slot == NULL) { + free_slot = s; + } + } + + /* We failed to find out shared memory, so just use free slot */ worker->s = free_slot; + return free_slot; +} so that the double call from init_balancer_members() does not hurt. (Note that I renamed it ap_proxy_set_scoreboard_lb, according to the changes...) Apologies (again) to propose partial things each time, this is the last one I hope.
Created attachment 32095 [details] proposed patch v5
Created attachment 32098 [details] proposed patch v6 This version avoids using ap_proxy_set_scoreboard_lb() in init_balancer_members() when PROXY_HAS_SCOREBOARD is not defined, and sets worker->s (with palloc()ed memory) only if it was not set above with ap_proxy_set_scoreboard_lb().
Proposed for 2.2.x in r1630402.
We have tried the patch first in our test environment without any noticeable issues. Today we rolled out the patch with Apache 2.2.29 on our production systems and noticed that the first worker in the balancer was favoured. We noticed that the first worker received about 90% or all the requests and the second worker received the leftover 10% while the third and fourth worker received almost no requests. The view of one of our balancer-manager's Worker URL Route RouteRedir Factor Set Status Elected To From http://xrtv1afp xrtv1a 0 0 Ok 25944 20M 223M http://xrtv1bfp xrtv1b 0 0 Ok 2077 2.2M 24M http://xrtv1cfp xrtv1c 0 0 Ok 196 193K 1.9M http://xrtv1dfp xrtv1d 0 0 Ok 278 274K 1.6M We noticed this behaviour on all our updated frontproxies, which range around 150~200 apache instances. After rolling back the patch and staying on the 2.2.29 release the issue was gone.
According to the balancer-manager, the lbfactor seems to be 0. That shouldn't happen. Are your balancer members also used as standalone workers (eg. same URL used before in a ProxyPass or <Proxy> section in your configuration)?
(In reply to Yann Ylavic from comment #37) > According to the balancer-manager, the lbfactor seems to be 0. That > shouldn't happen. > > Are your balancer members also used as standalone workers (eg. same URL used > before in a ProxyPass or <Proxy> section in your configuration)? That's happening to me too after installing a test package for RHEL 6. There is also another problem I just noticed after the update: I have two <Proxy balancer> directives in my config for the same domain, one for port 80 and another one for port 443 (the secure connection is not mandatory yet) and before applying the patch both balancer-manager pages showed independent values and stats for plain and secure, now they are showing exactly the same values. In my case the secure connection used to receive a lot less connections than the unsecure one. The config for port 80 is this one: Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED <Proxy balancer://ciklos-balancer> BalancerMember http://cdplin25.coomeva.nal:80 route=web1 loadfactor=1 retry=0 BalancerMember http://cdplin26.coomeva.nal:80 route=web2 loadfactor=1 retry=0 ProxySet stickysession=ROUTEID ProxySet nofailover=On ProxySet lbmethod=bybusyness </Proxy> ProxyPass /balancer-manager ! ProxyPass / balancer://ciklos-balancer/ ProxyPassReverse / balancer://ciklos-balancer/ And config for port 443 is the following (the only difference is the balancer name): Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED <Proxy balancer://ssl-ciklos-balancer> BalancerMember http://cdplin25.coomeva.nal:80 route=web1 loadfactor=1 retry=0 BalancerMember http://cdplin26.coomeva.nal:80 route=web2 loadfactor=1 retry=0 ProxySet stickysession=ROUTEID ProxySet nofailover=On ProxySet lbmethod=bybusyness </Proxy> ProxyPass /balancer-manager ! ProxyPass / balancer://ssl-ciklos-balancer/ ProxyPassReverse / balancer://ssl-ciklos-balancer/ Also note that the loadfactor is 1 but the balancer-manager shows 0 in both cases even after a hard stop/start sequence.
(In reply to Yann Ylavic from comment #37) > According to the balancer-manager, the lbfactor seems to be 0. That > shouldn't happen. > > Are your balancer members also used as standalone workers (eg. same URL used > before in a ProxyPass or <Proxy> section in your configuration)? We use this type of configuration. <Proxy balancer://xrtv1cluster> BalancerMember http://xrtv1afp route=xrtv1a BalancerMember http://xrtv1bfp route=xrtv1b BalancerMember http://xrtv1cfp route=xrtv1c BalancerMember http://xrtv1dfp route=xrtv1d ProxySet stickysession=xrtv1balanceid </Proxy> and then we proxypass through the balancer. ProxyPass / balancer://xrtv1cluster/ However we do request the server-status page every now and then directly from the worker. The lbfactor should be the default (1) and i see that currently (without the patch) it shows this aswel : Worker URL Route RouteRedir Factor Set Status Elected To From http://xrtv1afp xrtv1a 1 0 Ok 9914 9.6M 91M http://xrtv1bfp xrtv1b 1 0 Ok 9926 9.4M 93M http://xrtv1cfp xrtv1c 1 0 Ok 10063 9.6M 102M http://xrtv1dfp xrtv1d 1 0 Ok 9943 10M 94M
Created attachment 32159 [details] proposed patch v7 Thanks for testing and reporting the defect. The previous patch missed the (needed) unicity per vhost and per balancer for the workers and balancer members. This new patch should fix this. Could you please give it a try?
(In reply to Yann Ylavic from comment #40) > The previous patch missed the (needed) unicity per vhost and per balancer > for the workers and balancer members. s/unicity/uniqueness/
Thanks Yann for your help. Does it solve the problem about the load factor being 0 even when it's explicitly set to 1 in the config file?
(In reply to William Lovaton from comment #42) > Does it solve the problem about the load factor being 0 even when it's > explicitly set to 1 in the config file? Yes it should, the balancer member was initialized as a normal worker without the specific balancer parameters (hence those were all 0). I'd appreciate you can verify this with your configuration though.
we've implemented the new patch in our test environment and used ApacheBenchmark to test the system. As far as we can currently see the new patch works more as intended The lbfactor is set to 1 and also the requests are balanced evenly over the 2 workers. LoadBalancer Status for balancer://xrtv1cluster StickySession Timeout FailoverAttempts Method balancer://xrtv1cluster 0 1 byrequests Worker URL Route RouteRedir Factor Set Status Elected To From http://xrtv1afp-test xrtv1a 1 0 Ok 1776 473K 626K http://xrtv1bfp-test xrtv1b 1 0 Ok 1775 473K 626K
(In reply to Ferry Manders from comment #44) > As far as we can currently see the new patch works more as intended > The lbfactor is set to 1 and also the requests are balanced evenly over the > 2 workers. Thanks for testing, I'll propose this new patch instead of the previous one for 2.2.x backport.
Backport proposal (2.2.x) updated in r1635084.
Fixed in 2.2.30 (r1680920).