SA Bugzilla – Bug 6534
Evaluate UCEPROTECT
Last modified: 2012-08-13 08:04:55 UTC
http://www.uceprotect.net The owners of UCEPROTECT have requested testing in our weekly masscheck. I will add this to the weekly masscheck with "tflags nopublish" if there are no objections by this week Friday. This is NOT a proposal to add this or any particular blacklist to production rules. I anticipate that tests like this will be only temporary, to be removed in a few weeks if performance is shown to be bad. I intend these tests to either identify promising blacklists, or (more likely) be able to warn people away from blacklists with actual numbers.
I include these in my local configuration file with scores from 0.3 to 0.1 for the dnsbl-1 to dnsbl-3 lists respectively. I have not seen many hits. However, I have seen several hits on UCE-Prot's "backscatterer" list, ips.backscatterer.org. You may want to add this fourth list to your test.
Note that obviously the backscatterer list is pointless on it's own. It should be atleast metat with BOUNCE_MESSAGE if not even more specifically.
Then again I'(In reply to comment #2) > Note that obviously the backscatterer list is pointless on it's own. It should > be atleast metat with BOUNCE_MESSAGE if not even more specifically. Then again I'm not even sure of that.. it's really only useful blocking during MTA: http://www.backscatterer.org/?target=usage There's not much point in "amplifying" score for already known BOUNCE_MESSAGE. And wasting lookups for that.
...But if one has reached the point where one is scoring a message in SA (having not previously decided to reject it for other reasons), then any element which scores is NOT being used by itself - and therefore, in this specific case, the backscatterer list is not standing on its own but merely contributing to the overall score. I have found that when it does hit, usually the message would be classified as spam without the list's contributing points. As for the amplification argument, I disagree: It is better to have redundant confirmation than to have a failure with the primary determining factors while bypassing the redundancy thus resulting in an incorrect classification. For Warren Togami: I sent you the private mail. The problem you had with your reply bouncing is that gmail does not generate proper "Received:" headers. They claim "with smtp" but omit the "from" clause - a violation of RFC 5321 and its predecessors (with 821 & 2821 being standard) required syntax.
> For Warren Togami: I sent you the private mail. The problem you had with your > reply bouncing is that gmail does not generate proper "Received:" headers. > They claim "with smtp" but omit the "from" clause - a violation of RFC 5321 and > its predecessors (with 821 & 2821 being standard) required syntax. BTW, I never heard a response to that mail (reposted to dev@). Some of your example DNSBL's you sent me are either broken or incorrect. http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/wtogami/20_ubl.cf?revision=1060616&view=co For example ubl AFAICT is timing out most of the time, effectively unusable, and when it used to work its performance and FP rate was terrible. http://www.uceprotect.net/en/index.php?m=6&s=11 Your UCEPROTECT examples had uceprotect.com, but isn't it supposed to be .net? It appears that .net works here while .com had no hit.
Adding wtogami/20_bug_6534_uceprotect.cf Transmitting file data . Committed revision 1062090.
"That means the UCEPROTECT-Network will gladfully ignore [the ASRG] and it's decisions or votings from now." - http://www.ietf.org/mail-archive/web/asrg/current/msg16741.html Today. Nice of them to announce their intention to ignore the official authority on spam mitigation.
I've been following this thread as well and am disappointed in all parties involved. While I have a hard time defending Claus after the above (and related) threads, I don't think the IETF has any business dictating a revenue model. Charging for an expedited version of an otherwise free service seems completely rational. However, with the above pettiness aside, SA is the perfect place for UCEPROTECT in a mail environment. Their revenue model and policies don't have to become ours any more than we want them to. In addition, UCEPROTECT's decision to not follow BCP07 is entirely in their right to do so. It's a BCP, not an RFC. Not that RFC's are required to be followed, adherence is generally more strict. In addition, I've had fair success with local UCEPROTECT rules and would like to see how they compare against the general corpus.
I've operated my entire anti-spam career largely ignoring the "authorities", instead attempting to use measurement and statistics to find out for myself. Whatever authority can say whatever they want about blacklist standards but what matters most is how useful and safe a blacklist is as measured. That being said... http://ruleqa.spamassassin.org/20110226-r1074804-n UCEPROTECT_L1 is one of the worst performing of the DNSBL's were currently testing. http://www.spamtips.org/2011/01/dnsbl-safety-report-1232011.html http://ruleqa.spamassassin.org/20110226-r1074804-n/T_RCVD_IN_UCEPROTECT_L1/detail It has very high overlaps with MSPIKE_BL at 80%, PSBL at 73% and HOSTKARMA_BL at 89%. Yet despite its similarities with those high safety rated blacklists, 2% of our ham corpus from the past week hit in UCEPROTECT_L1, and this has been pretty consistent for the previous four weeks. Like the ampr.org fellow above who suggested typo broken Spamassassin rules containing UCEPROTECT, I'm afraid much anti-spam advice out there is made without adequate testing and even cursory examination of the statistics.
I suggested it FOR TESTING. I simply noted that it was not among the usual DNSBL suspects that SA normally scores. I didn't say that it was good; only that it was missing.
> I suggested it FOR TESTING. I simply noted that it was not among the usual > DNSBL suspects that SA normally scores. I didn't say that it was good; only > that it was missing. I'm sorry for the misunderstanding. I was somehow under the impression that your example rules were what you used in production (thus tested) and I copied your rules verbatim at first. This became alarming as UBL is utterly broken at the server and UCEPROTECT had syntax errors.
(In reply to comment #11) > > I suggested it FOR TESTING. I simply noted that it was not among the usual > > DNSBL suspects that SA normally scores. I didn't say that it was good; only > > that it was missing. > > I'm sorry for the misunderstanding. I was somehow under the impression that > your example rules were what you used in production (thus tested) and I copied > your rules verbatim at first. This became alarming as UBL is utterly broken at > the server and UCEPROTECT had syntax errors. I have to say that after running into an IP on UCEPROTECT's powered backscatterer.org, I was shocked to see they charge money for delistings. I consider this to be unethical. It's akin to paying to be allowed to Spam and it severely harms their credibility. Something to be considered. Regards, KAM
IMO, charging for [express] delisting is akin to extortion and may be illegal in some jurisdictions around the world. However, the reason why some IP address is listed in the first place is the demonstration of merit in considering the list.
(In reply to comment #13) > IMO, charging for [express] delisting is akin to extortion and may be illegal > in some jurisdictions around the world. However, the reason why some IP > address is listed in the first place is the demonstration of merit in > considering the list. True but looking at comment such as Warren's above that this is "one of the worst performing of the DNSBL's were [sic] currently testing." means it fails on technical merit as well as ethical consideration. I think we can close this ticket as "evaluated" and "failed". Warren? Regards, KAM
(In reply to comment #14) > [...] fails on technical merit as well as ethical consideration. > I think we can close this ticket as "evaluated" and "failed". +1
+1
+1 but could we please fix ruleqa first so we can record some final data before it is removed from masscheck?
Time to close this? SPAM% HAM% S/O RANK SCORE NAME 20110319 30.5217 0.4457 0.986 0.73 0.00 T_RCVD_IN_UCEPROTECT_L1 20110409 43.1848 0.6689 0.985 0.70 0.00 T_RCVD_IN_UCEPROTECT_L1
http://ruleqa.spamassassin.org/20110122-r1062119-n/T_RCVD_IN_UCEPROTECT_L1/detail Results from 1/22/2011 http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_UCEPROTECT_L1/detail Results from 4/9/2011 * Recent results indicate 1.3% FP rate on the past week of ham, ranging up to 2.5% FP's in recent weeks. This is worse than our measurements in January. * 88% spam overlap with RCVD_IN_MSPIKE_BL. (Most likely the only new DNSBL to be added to spamassassin-3.4. Look at its stellar spam detection performance with almost zero FP's!) * 83% spam overlap with RCVD_IN_HOSTKARMA_BL. * 63% spam overlap with RCVD_IN_PSBL. * 62% spam overlap with RCVD_IN_PBL. From multiple months of analysis showing roughly the same thing, it seems clear that UCEPROTECT is a little too unsafe and redundant for spamassassin users. I suppose we can remove it from masscheck for now. But I'd like to measure it again every few months to see if it has changed any. I like being able to definitively warn people against stuff like this with fresh data.
How about closing this bug and leaving the tests in mass-check?
(In reply to comment #20) > How about closing this bug and leaving the tests in mass-check? Agreed. The votes are clear and masscheck data can be added when available. SA won't be implementing UCEPROTECT due to the previous results.
I'd like to propose removal/disabling of these test from sandboxes ( /trunk/rulesrc/sandbox/wtogami/20_bug_6534_uceprotect.cf) They're set to nopublish (since 2011) and while it's nice to test new BLs, it puts an unnecessaary load on weekly masschecks to keep them there for such a long time. comments, votes please!
+1 Any BLs that are active in sandbox with no one actively reviewing the results should and really must be disabled. Net masschecks are taking too long on the weekend!
FTR: Rules commented out on Aug 12 2012