Bug 45405 - Allow binding port to be set for individual workers for proxy requests
Summary: Allow binding port to be set for individual workers for proxy requests
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_proxy (show other bugs)
Version: 2.5-HEAD
Hardware: All All
: P2 enhancement with 4 votes (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: PatchAvailable
: 42013 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-07-16 00:32 UTC by rahul
Modified: 2018-02-25 20:38 UTC (History)
2 users (show)



Attachments
Patch to allow binding port to be set per worker in mod_proxy (5.45 KB, patch)
2008-07-16 00:32 UTC, rahul
Details | Diff
updated patch with both server wide and workerspecific bind (8.80 KB, patch)
2008-07-17 07:16 UTC, rahul
Details | Diff
Updated patch correcting the mistakes :) (8.73 KB, patch)
2008-07-18 02:25 UTC, rahul
Details | Diff
Changes suggested applied. (9.05 KB, patch)
2008-07-18 12:08 UTC, D. Stussy
Details | Diff
A tiny ruby test file to simulate binding ports (1.25 KB, application/octet-stream)
2008-07-21 08:11 UTC, rahul
Details
updated patch (8.74 KB, patch)
2008-07-22 00:02 UTC, rahul
Details | Diff
updated patch (8.42 KB, patch)
2008-07-22 23:40 UTC, rahul
Details | Diff
Updated patch (8.75 KB, patch)
2008-07-23 14:59 UTC, D. Stussy
Details | Diff
Updated patch - against httpd-2.2.20 (8.77 KB, patch)
2011-09-03 06:18 UTC, D. Stussy
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description rahul 2008-07-16 00:32:45 UTC
Created attachment 22264 [details]
Patch to allow binding port to be set per worker in mod_proxy

This is an enhancement to what is described in
https://issues.apache.org/bugzilla/show_bug.cgi?id=44806

Allow a binding port and ip-address to be set for individual workers for out going proxy requests.

The format of configuration is

<Proxy http://url>
ProxySet bind=myhost.com:port+range
</Proxy>

e.g
<Proxy http://url>
ProxySet bind=myhost.com:8000+1000
</Proxy>

The same can also be used in balancer sections for
BalancerMember in <Proxy balancer://cluster>
Comment 1 D. Stussy 2008-07-16 11:12:38 UTC
Comments:

1)  In this version, you're still defining variables inside the for(;;) statement.  As noted in the parent enhancement 44806, that is not a portable construction for some compilers.  Please pull the declarations out of the for() statements.  Otherwise, except for a small change in which structure the data is stored in, the code looks almost identical to our prior solution, so it should work without additional extensive testing.

2)  My thought was to have BOTH a virtual-host-wide "ProxyBindAddress" and an individual setting for a proxy worker if separately defined.  Therefore, when a worker is first initialized, shouldn't it copy the bind-address info from the "conf" structure to the "worker" structure (which maybe subsequently overwritten by a worker-specific directive)?  Simply copying the data from one structure to the other should suffice, INCLUDING copying the pointer - as "conf" should never be released as long as it has a "worker" structure under it.  I make one assumption:  That at least one "worker" structure will always exist for each "conf" structure (I have not confirmed that in the code).

As such:
  changes to proxy_util.c would be as per this patch, not 44806.
  changes to mod_proxy.c from both would be combined into a parsing routine that
     passes the character string and addresses for 3 parameters.  The fourth
     parameter, "...->bindopt_set" could be determined from the return code of
     the parsing subroutine (char *; NULL == success, otherwise error message).
     There would be minimal code changes to both line ranges for the TWO calls
     to the parser, one for "ProxyBindAddress" and one for "ProxySet ... bind=".
   + Also, a "new_worker()" routine needs to copy data from "conf" to "worker".
  changes to mod_proxy.h would have BOTH changes, one from each enhancement.

for mod_proxy.c:
char *parse_bind_address(char *addr, apr_sockaddr_t **bind_addr, 
     apr_port_t *bind_port, apr_port_t *bind_range)
move the parsing code into here - adjusting as necessary.

from main configuration - ProxyBindAddress()
set_proxy_bindaddr(cmd_parms *parms, void *dummy, const char *addr)
+{
+    char *range, *host, *scope_id;
+    apr_port_t port;
+    int r = 0;
+
+    proxy_server_conf *psf =
+    ap_get_module_config(parms->server->module_config, &proxy_module);
+
++   char *rc;
++   rc = parse_bind_address( addr, &psf->bind_addr, &psf->bind_port, &psf->bind_range)
++   if (rc == NULL) {
++       psf->bind_idx = 0;
++       psf->bindopt_set = 1;
++   } 
+    return rc;

from worker configuration - ProxySet() ... bind=
+    else if (!strcasecmp(key, "bind")) {
++       char *rc;
++       rc = parse_bind_address( (char*)val, &worker->bind_addr,
++                                &worker->bind_port, &worker->bind_range)
++       if (rc == NULL) {
++           worker->bind_idx = 0;
++           worker->bindopt_set = 1;
++       }
++       return rc;
+    }
     else {
         return "unknown Worker parameter";

Configuration notes:
 - Workers defined via ProxySet BEFORE ProxyBindAddress would inherit the default settings of no info (unspecified address, port 0, range 1).
 - Workers defined via ProxySet AFTER ProxyBindAddress should inherit the ProxyBindAddress settings in effect for its scope (main config, virtual host, etc).
Comment 2 rahul 2008-07-17 06:40:37 UTC
> 1)  In this version, you're still defining variables inside the for(;;)
> statement.  As noted in the parent enhancement 44806, that is not a portable
> construction for some compilers.  Please pull the declarations out of the for()

oops sorry, some habits are hard to break :(

> Configuration notes:
>  - Workers defined via ProxySet BEFORE ProxyBindAddress would inherit the
> default settings of no info (unspecified address, port 0, range 1).
>  - Workers defined via ProxySet AFTER ProxyBindAddress should inherit the
> ProxyBindAddress settings in effect for its scope (main config, virtual host,
> etc).

I am not sure relying on the order of definition is a good thing. Perhaps a better idea would be to delay the merge of server and worker conf until the actual connection has to be made?

Comment 3 rahul 2008-07-17 07:16:45 UTC
Created attachment 22271 [details]
updated patch with both server wide and workerspecific bind
Comment 4 D. Stussy 2008-07-17 14:11:04 UTC
Generally, looks good.  I like your solution to the configuration issue I raised - with no need to copy conf->bind* to worker->bind*.  I see we got rid of "bindopt_set" as redundant (I hated that too).

Parsing: "if (range) ..." - closing brace.  Should "if((apr_parse_addr_port()..." be inside the closing brace?  If we have ONLY a hostname, we don't have a range, so we never set "host" - and thus apr_sockaddr_info_get() will fail.  Yet it seems as if "ip/hostname" with NO port or range should work.  Moving the "if((apr-parse..." outside of the "if (range) ..." closing brace appears to restore what syntax we defined.

Parsing:  Range can be set to zero INTERNALLY.  We lost the "r+1" bump-up ("conf-> or worker->bind_range = r + 1;") from the original patch before we converted the parser to a separate subroutine.  Fix:

Change comparison in apr_status_t bind_to_addr() from "<" to "<=", i.e.
+    for(i = 0; i <= range; ++i) { /* loop until we can bind correctly*/
(line 2277 after patch applied).

Additional comment:  "apr_pcalloc(p, sizeof(proxy_bind_addr))"
Do we know that this will never return NULL (i.e. "out of memory")?  If we can't guarentee that, then we need to check the value and abort parsing.
Assumption:  Allocated memory is zeroed.  Therefore, variables referenced but not set are zero (pointers NULL), especially the parameters to parse_bind_address().


Documentation:
-------------------------------------------------------------------------
Command: ProxyBindAddress   [<hostname/address-literal>][:<port>+<range>]
Command: ProxyPass ... bind=[<hostname/address-literal>][:<port>+<range>]
Command: ProxySet  ... bind=[<hostname/address-literal>][:<port>+<range>]
Context: server config, virtual host (Should <PROXY> sections be allowed too?)
Default: Address: unspecified address ("0.0.0.0" for IPv4; "::/128" for IPv6)
         Port: unspecified port (0 => use any port available - OS choice)
         Range:  0 (use only the port specified, if any)
Status:  Extension

This command is useful in order to restrict outbound proxy server requests to use the specified IP address(es) and/or (TCP) port range.  Such limits may be imposed by server firewall design as a security measure or for statistical data collection.  The ProxyPass and ProxySet versions of the command override the general declaration for a particular proxy worker or balancer.

Specifying a hostname or an address-literal shall bind all outbound proxy
requests to the IP address(es) specified or resolved.  DNS resolution is
used to translate hostnames at configuration time, so if a hostname maps to
multiple addresses, the address used may vary across requests. (Internally, all resolved addresses are stored.)  Should the DNS data for the hostname change, Apache will ignore any such changes until it reloads its configuration or is stopped and restarted.  If DNS resolution fails, no source binding takes place and an error is issued.  As noted elsewhere in the Apache documentation, IPv6 address literals that contain colons must appear in brackets.

Specifying a non-zero port locks in that port as the one used, or if a non-zero range is specified, the first one used.  Specifying a range indicates how many additional consecutive ports beyond the first may be used.  Specifying a range of "+0" means that only the specified port is used, thus causing serialization of requests.  This may deny additional requests made in parallel.  A warning may be issued to the system log for a range less than 8.  Explicitly specifying port 0, thus allowing the operating system to choose a random port, does not permit a range value and is equivalent to omitting the port value.

Examples:
  ProxyBindAddress 192.0.2.1:10000+10

This sets the IP address to the IPv4 address of 192.0.2.1
There are 11 valid ports for this range:  10000-10010
IPv6-only sites will not be reachable.  Only IPv4 sites will be contacted.

  ProxyBindAddress [2001:df8::1]:49151+9

This sets the IP address to the IPv6 address of 2001:df8::1
There are 10 valid ports for this range:  49151-49160
IPv4-only sites will not be reachable.  Only IPv6 sites will be contacted.

  ProxyBindAddress localhost

This sets the IP address to the IPv6 address of ::1 and the IPv4 address to 127.0.0.1 (assuming DNS records are set for both IP versions).  The operating system chooses the outbound port to use.  Both IPv4 and IPv6 sites are reachable.  However, as the loopback address is restricted, only sites on the same physical host can be reached.

Notes:
In the current implementation, it is not possible to specify separate port
ranges for different addresses or address families.  It is also considered an
error to specify an interface with an IPv6 address literal ("%" parameter).  Specifying port 0 with a range is invalid.  Specifying an address literal also locks the proxy server into the address family the literal belongs to.  Therefore, only a hostname produces an address family independent assignment assuming that both DNS A (IPv4) and AAAA (IPv6) records exist for the name.  Internally, not specifying the command at all actually skips any attempt to bind the source address and port of the outbound request (the default behavior before this command was added).  However, explicitly using the command with the unspecified address will cause address and port binding, even if the result turns out to be the same action as if the command were unspecified (i.e. port also 0).  Using the IPv4 unspecified address ("0.0.0.0") will force IPv4-only connections, and similarly with IPv6 ("::" forces IPv6 only).

Where a non-zero port value is specified, a range value less than the number of available workers (or child processes for non-threaded servers) may yield connection failures when all ports in the range are in use, even when there are available worker threads/processes.  Therefore, the range value should equal or exceed the maximum possible number of proxy workers.

Apache Hackers' note:
This code appears to be address family independent.  Therefore, if address family data other than IPv4 and IPv6 are returned from apr_parse_addr_port(), and such other address family data are accepted by bind_to_addr(), other address families may work.  The code only attempts binding where both the source and destination have the SAME address family.  If the source and destination have no common address family, "DECLINED" is returned.
Comment 5 rahul 2008-07-18 02:25:30 UTC
Created attachment 22278 [details]
Updated patch correcting the mistakes :)

>"apr_pcalloc(p, sizeof(proxy_bind_addr))"
>Do we know that this will never return NULL (i.e. "out of memory")?
-The pool will call abort_fn (system abort set in main.c unless overridden)
before returning null. This and the fact that we are allocating memory during configuration reading phase (i.e the pool is from conf rather than a custom pool) assures us that apache will exit rather than give us null for this pool.

>Allocated memory is zeroed. 
-apr_pcalloc is just defined as apr_palloc + memset in apr_pools.h
Comment 6 D. Stussy 2008-07-18 12:08:25 UTC
Created attachment 22283 [details]
Changes suggested applied.
Comment 7 D. Stussy 2008-07-18 12:23:59 UTC
I noted two problems:

1)  Error:  Change to how we parsed "range":  By not bumping by one in the parser as we used to do, we caused a potential division by zero at the modulo division in bind_to_addr().  Revert condition in for-loop, bump by one in the parser, and adjust boundary tests, syslog warning, and set range = 1 where user omits it.

2)  Performance issue -- Rotating across the port range on different calls to bind_to_addr():  As is, we seem to be starting each subsequent search for a port at the port after we started the last search.  The adjustment I proposed starts the search at the port after where we ENDED the last search.  There seemed little point re-checking FIRST ports we skipped last call, as opposed to checking ports that we might not have checked last time.  If we had checked them all last time, it doesn't matter where we start.

Also noted:  Setting "idx" = "i + 1" doesn't necessarily rotate in the matter intended.  It could also set "idx" == "range", but idx implicitly has a valuation range of 0...(range -1).  As we use it with modulo arithmetic, we got lucky and properly handled exceeding the maximum value.  I think this also meant that "idx" on calls after the first would NEVER be zero and therefore the first port might have been selected less often than the round-robin we intended.

I chose not to invalidate the previous updated patch as I have NOT tested my suggestion with corrections above applied.
Comment 8 rahul 2008-07-21 08:11:24 UTC
Created attachment 22293 [details]
A tiny ruby test file to simulate binding ports

Run this as ruby ./binder.rb,
This is the proposed (slightly changed from your patch) rotation scheme,
The logic in each function is same as in apache module.
I will post the patch as soon as testing is complete.
Comment 9 rahul 2008-07-22 00:02:42 UTC
Created attachment 22296 [details]
updated patch

The updated patch (Using the same rotation logic as in the ruby test file)

The configuration I used to test was:
 
ProxyRequests On
ProxyVia on

<Proxy http://webproxy.india.sun.com>
    Allow from all
    ProxySet bind=agneyam.india.sun.com:4040+10
</Proxy>

<Proxy http://vault.red.iplanet.com>
    Allow from all
    ProxySet bind=agneyam.india.sun.com:3030+10
</Proxy>


ProxyBindAddress agneyam.india.sun.com:9090+10
Comment 10 D. Stussy 2008-07-22 11:55:41 UTC
I believe you missed a couple of points that I made in last suggested patch.  These do not affect the outcome but do affect efficiency:

1)  Bind_to_addr() "const int range = bind->range + 1;":  This means that it will have to add one for every try per address family to bind to a port in the range.  I incremented the value in the parser because that is one exactly ONCE, not once per attempt to bind.  It repeats a fixed operation over and over which only need be done once forever (until reconfigured).

2)  Bind_to_addr() "bind->idx = idx + i + 1;":  Technically incorrect.  "idx" is defined to have values 0...(range-1), and this code can set "idx" to equal 2*range-1 or MORE.  The only reason the code works is the modulo arithimetic on line 2278.  Note that an optimizing compiler will realize that "idx" remains equal to "bind->idx" and thus compile this as "bind->idx += i + 1;".  I still think that this should be "bind->idx = (idx + i + 1) % range;" OR "bind->idx = (port - start + 1) % range;" (same result but perhaps earier to comprehend for someone not familiar with the code) to keep the variable inside our defined limits for it.

3)  New point:  parse_bind_address().  Since we have a "struct proxy_bind_addr", we don't need to pass by reference three structure elements.  We could pass the address to the structure itself instead.  This also allows us to set "bind->idx=0" in one place -- the parser, not two as we do now, and should anything else call the parser in the future, then the value won't be forgotten.
Comment 11 D. Stussy 2008-07-22 12:01:04 UTC
One last thing I noted:  parse_bind_address()
- Should explicitly return 0 upon success.
Comment 12 rahul 2008-07-22 23:40:24 UTC
Created attachment 22300 [details]
updated patch

1)  Bind_to_addr() "const int range = bind->range + 1;":  
-agreed

2)  Bind_to_addr() "bind->idx = idx + i + 1;":  Technically incorrect.
-agreed

3) proxy_bind_addr", We could pass the address to the structure 
- done

One caveat is that if the range defined is lesser than the number of workers
available, apache will start to return 404 after the range is exhausted. This is because the workers keep the backend connection open even after the connection completes. This causes a newly requested worker to not get a free port. (If the number of workers is lower than range, one of the workers currently holding the open connection will be used for the connection)

How ever, I dont see a way to check this during configuration time. (the number of workers may be modified after the proxy bind directives are run.)
Comment 13 D. Stussy 2008-07-23 14:59:18 UTC
Created attachment 22309 [details]
Updated patch

Updated patch - Cosmetic issues only - functionality not broken:

Changes:
+++ modules/proxy/proxy_util.c
@@ -2349,6 +2374,24 @@
...
+                ap_log_error(APLOG_MARK, APLOG_ERR, 0, s, "proxy: %s: can not bind to %s:%u+%u",
+                        proxy_function,bind->addr, bind->port, bind->range, NULL);
+                return DECLINED;

=>   ..., bind->range - 1, NULL);

Internally, it's one more than the admin specified in the configuration, so we have to subtract one to get the original value that was human-specified.


Comment in parser:
+    * we dont accept a range with out a port either.
without is one word in English.

Added two additional comment lines.


I believe we're done.  I have not actually tested the patch.
Comment 14 D. Stussy 2008-10-26 13:55:19 UTC
Request for review posted to developers mailing list today.
Comment 15 D. Stussy 2010-10-19 13:59:25 UTC
BUMP(after 2 years).  The patch is done, but not picked up by the HTTPD trunk.  Is there anything else that need be done?
Comment 16 Mina Galić 2010-11-17 10:03:00 UTC
(In reply to comment #15)
> Is there anything else that need be done?
It's quite good practice to also post to dev@
Comment 17 D. Stussy 2010-11-17 14:22:50 UTC
RE - Comment #16:  Note that I did exactly that <B>two years ago</B> (25 months) as noted by comment #14.
Comment 18 Eric Garreau 2010-11-18 06:43:48 UTC
surprisingly nobody answers, although I think it's a great piece of code which allows stronger firewall DMZ/MZ rules.

the associated documentation is clear, and the exchanges show the code quality has been studied thoroughly, so I don't see any reason to stay away from the trunk.

is there someone thinking this feature could create a major regression or performance drop ?  (I've not tested it)
Comment 19 D. Stussy 2010-11-18 20:51:58 UTC
I've been running it as a patch to my local copy of httpd 2.2.17 and have yet to notice any problems.  The machine is on an IPv4 /29 and virtually-hosts 5 IP addresses, and I do confirm that the outbound address I picked is not the default that the OS would pick - so I know it works.  My proxy function listens only on the "localhost" addresses (I have IPv4+IPv6), so there's little security risk for me.
Comment 20 Stefan Fritsch 2011-06-25 20:52:26 UTC
A simpler variant of this has already been committed some time ago, see PR 29404 and r1034916. Definitely still missing is documentation and support for ProxySet.

Selecting a port range is also not supported, but frankly I don't see the value of that.
Comment 21 D. Stussy 2011-06-25 22:02:25 UTC
The value of selecting the port range is that it can be pre-determined and/or reserved in advance and thus programmed into a firewall.  Some people prefer stricter control than others.  Some may want to reserve other ranges for other programs/services and thus need to tell apache to avoid such.

As far as documentation goes, you're the first to ask for it formally.  I'll see if I can generate such during July.  It won't be substantially different that what was presented in text except for the HTML tags that will be added and a diff file generated.

Note that this proposal predated the "simpler one" committed by two years (July 2008 vs. November 2010, but opened in 2004).  I'm supposed to know that someone came along, opened a bug in parallel, and did the same work when after 4 years, it still had not been committed -- as opposed to abandoned?  Regardless, it was [first] propsoed against Apache version 1 where this approach started with version 2 and ignored 1.  Also note that this version is more versitile - in that the address/port range may differ per worker or proxied target, not constant across all proxied connections.
Comment 22 Stefan Fritsch 2011-06-27 21:11:40 UTC
(In reply to comment #21)
Sorry, my comment was merely meant to document the current status, because I wanted to apply your patch and was surprised that some similar functionality was already present. I did not mean to imply that you should write the documentation or put any more work into your patch.

I would have preferred if your more complete patch would have been applied in the first place. The per-target configuration is definitely something I want (that's what I meant with ProxySet). Reverting the patch from PR 29404 and applying this one is still a possibility, but I am not yet decided, yet.
Comment 23 D. Stussy 2011-06-27 23:32:19 UTC
Thank you - although I cannot take 100% of the credit.  This was a combination effort between two people.  I was more responsible for the concept, QA, debugging, and documentation, while "Rahul" wrote the alpha version (cf. bug 44806).

What we need most is a proponent who can get this added to the trunk and to the production versions.  We've tried posting to the developers list, but no one seemed interested.

Our version also accepts hostnames and IPv6 literals.  I don't think the other one does.  I also think it was commited ONLY to trunk and not to production versions, because I don't remember seeing it present in 2.2.19.
Comment 24 D. Stussy 2011-09-03 06:18:25 UTC
Created attachment 27455 [details]
Updated patch - against httpd-2.2.20

The current patch has mismatches due to other things having been added to httpd.
Comment 25 Rainer Jung 2018-02-25 20:25:35 UTC
*** Bug 42013 has been marked as a duplicate of this bug. ***
Comment 26 Rainer Jung 2018-02-25 20:37:52 UTC
Undo spam change