SA Bugzilla – Bug 6508
Speeding up lookups on {trusted,internal,msa}_networks
Last modified: 2012-10-30 01:03:56 UTC
Prompted by the current brokenness in NetAddr::IP 4.034 and 4.035 (Bug 6507) and after Philip Prindeville pointed out that a Net::Patricia module might be a useful replacement or addition, I have instrumented the module Mail::SpamAssassin::NetSet with timing reports, and later added the use of Net::Patricia to it - with very encouraging (and worrying) results. Btw, the Net::Patricia module implements radix trie lookups, which are well suited for fixed size keys (such as 128 bits of a network address) and does a lookup in constant time O(k), proportional only to a size of a network address (fixed 128 bits), and independent of a size of the trie. As such it is commonly used in IP routing, efficiently finding a closest match from a set of available routes. See: http://en.wikipedia.org/wiki/Patricia_trie Here are some benchmarking results measured on real life mail, on an AMD quad core Opteron 2376: Elapsed time PER MESSAGE spent in NetSet lookups (sub contains_ip) for a set of n networks (using about 15 DNS white-/black lists): nets current ---- -------- 27 22 ms 1000 500 ms 7000 3700 ms Elapsed time for looking up ONE IP ADDRESS in a set of n networks: nets current patricia trie ---- -------- ------------- 27 1.5 ms 0.07-0.18 ms 1000 34 ms 0.10-0.20 ms 7000 250 ms 0.10-0.20 ms See also: Bug 5931 - add_cidr chokes on large amount of trusted_networks Conclusion 1: for anything above 4 or so networks it pays off to use Net::Patricia lookups. Conclusion 2: the Plugin/DNSEval.pm performs one lookup on the trusted_networks list for each DNSBL response, all of these are lookups on THE SAME IP address. It clearly calls for some caching of NetSet lookup results within processing of each message. Now, there is one caveat with using a Patricia trie for lookups: it finds the tightest match on listed CIDR networks - unlike our present sequential search, which returns the first match in the order of declared networks. For all practical purposes on valid data this makes no difference. The difference pops up on overlapping conflicting networks. Some of these are reported by lint, but not all. For this reason a couple of t/trust_path.t tests currently fail when Net::Patricia is installed and NetSet decides to use it. If Net::Patricia is not installed, the NetSet just uses the old sequential search on a list of NetAddr::IP objects. Guessing that Hudson does not have the Net::Patricia installed, I'd just roll in the change to trunk for ease of assessment, and can back off patricia or change the trust_path.t tests later on when we decide what would be the best. Would that be alright?
Created attachment 4819 [details] implement NetSet Patricia trie lookups and caching trunk: Bug 6508: Speeding up lookups on {trusted,internal,msa}_networks Sending lib/Mail/SpamAssassin/NetSet.pm Sending lib/Mail/SpamAssassin/PerMsgStatus.pm Sending lib/Mail/SpamAssassin/Util/DependencyInfo.pm Committed revision 1031095.
(In reply to comment #0) > Now, there is one caveat with using a Patricia trie for lookups: > it finds the tightest match on listed CIDR networks - unlike our > present sequential search, which returns the first match in the > order of declared networks. For all practical purposes on valid > data this makes no difference. The difference pops up on overlapping > conflicting networks. In my experience this has been an important feature. I know of a number of people that have setups like this: trusted_networks 10.20.30.0/24 internal_networks 10.20.30.0/24 !10.20.30.2/31 Where the /24 covers their MXes and the /31 covers their MSAs. I suppose that msa_networks solves this particular problem (I think) but I don't know what the uptake of msa_networks has been like... lots of people got things working long before I came up with msa_networks. I think I'd lean towards only using Net::Patricia if there are no "!" entires, or optionally. Daryl
> In my experience this has been an important feature. > I know of a number of people that have setups like this: > trusted_networks 10.20.30.0/24 > internal_networks 10.20.30.0/24 !10.20.30.2/31 > Where the /24 covers their MXes and the /31 covers their MSAs. The above produces a lint warning: $ spamassassin --lint warn: netset: cannot exclude 10.20.30.2/31 as it has already been included Similarly, of the six failing test cases in trust_path.t all but one produce a lint warning (if I counted correctly). The remaining one is perhaps questionable or maybe a lint test can be added. We have been telling people they cannot expect flawless operation in presence of lint errors or warnings. Following a principle of 'garbage-in, garbage-out', maintaining backwards compatibility with invalid setups is not always necessary or possible, especially when warnings _are_ being issued. But I agree the concept of these three sets of networks is hard to understand and may not even be applicable to every topology, so any hand-guiding in a form of documentation with examples and warnings is more than welcome.
> We have been telling people they cannot expect flawless operation > in presence of lint errors or warnings. Following a principle of > 'garbage-in, garbage-out', maintaining backwards compatibility with > invalid setups is not always necessary or possible, especially > when warnings _are_ being issued. Just to point out the obvious, and to ensure it stays that way... We're talking Target Milestone 3.4.0. With a target like that, backwards compatibility can be dropped, if documented and communicated properly in the release notes. If it actually introduces a noticeable advantage, even in edge-cases, *and* there are other methods to get equivalent behavior (msa_networks?).
(In reply to comment #3) > > In my experience this has been an important feature. > > I know of a number of people that have setups like this: > > trusted_networks 10.20.30.0/24 > > internal_networks 10.20.30.0/24 !10.20.30.2/31 > > Where the /24 covers their MXes and the /31 covers their MSAs. > The above produces a lint warning: > $ spamassassin --lint > warn: netset: cannot exclude 10.20.30.2/31 as it has already been included Crap, I forgot that smaller "conflicting" networks had to be declared first. I now remember arguing about why I wanted to do that too (so you didn't try to exclude something you already included or vice-versa). So, I think this may make it a non-issue, no? I think the smaller network will always be the accurate "answer". > Similarly, of the six failing test cases in trust_path.t all but one > produce a lint warning (if I counted correctly). The remaining one > is perhaps questionable or maybe a lint test can be added. It's quite possible that the tests are wrong. > We have been telling people they cannot expect flawless operation > in presence of lint errors or warnings. Following a principle of > 'garbage-in, garbage-out', maintaining backwards compatibility with > invalid setups is not always necessary or possible, especially > when warnings _are_ being issued. I just forgot about the ordering rule... if you get the warn from lint it won't actually add that part of the config (like it says). And with the smaller networks always listed first I think with Net::Patricia it'll DTRT.
(In reply to comment #4) > Just to point out the obvious, and to ensure it stays that way... > We're talking Target Milestone 3.4.0. > With a target like that, backwards compatibility can be dropped I actually think, now, that it won't change any behaviour.
To recap: the Net::Patricia support is done and works well, what is still needed is to polish the t/trust_path.t test which unnecessarily fails in the presence of Net::Patricia, or at least document the fact in 3.4.0 release notes.
(In reply to comment #7) > To recap: the Net::Patricia support is done and works well, > what is still needed is to polish the t/trust_path.t test > which unnecessarily fails in the presence of Net::Patricia, > or at least document the fact in 3.4.0 release notes. After working on this for a few hours, I think there are some caveats with using Net::Patricia that are not well documented. However, it appears the cases that fail in trust_path.t are incompatibilities with the syntax for {trusted,internal,msa}_networks. I'm adding some code to avoid the checks if Net::Patricia is installed and some documentation warning regarding the syntax for Net::Patricia not being as flexible as NetAddr::IP with overlapping networks. I believe we can safely call it poorly configured for the admins technical enough to want to utilize Net::Patricia. If it becomes more prevalent though we might want to figure out a lint warning for all of the syntaxes that aren't supported. Committing and closing. And I can now do a make test on trunk with every single optional module installed. This also works with and without Net::Patricia though I'll test it on an older perl after the commit and report back if there is a problem. svn commit -m 'Net::Patricia not as flexible as NetAddr::IP with overlapping networks. Updated documentation to help make this more apparent and also fixed trust_path.t to avoid tests for incompatible syntaxes.' Sending lib/Mail/SpamAssassin/Conf.pm Sending lib/Mail/SpamAssassin/Util/DependencyInfo.pm Sending t/trust_path.t Transmitting file data ... Committed revision 1373194. Regards, KAM
> This also works with and without Net::Patricia though I'll test it on an > older perl after the commit and report back if there is a problem. Also tested and passed on 5.8.6 on non-IPv6 system without Net::Patricia.
> After working on this for a few hours, I think there are some caveats with > using Net::Patricia that are not well documented. However, it appears the > cases that fail in trust_path.t are incompatibilities with the syntax for > {trusted,internal,msa}_networks. > > I'm adding some code to avoid the checks if Net::Patricia is installed and > some documentation warning regarding the syntax for Net::Patricia not being > as flexible as NetAddr::IP with overlapping networks. Documentation update: - document how Net::Patricia search differs from a sequential search - document/clarify IPv6 syntax in {trusted,internal,msa}_networks - remove the claim on flexibility trunk (3.4): Sending lib/Mail/SpamAssassin/Conf.pm Sending lib/Mail/SpamAssassin/Util/DependencyInfo.pm Sending t/trust_path.t Committed revision 1403592.