Bug 6534 - Evaluate UCEPROTECT
Summary: Evaluate UCEPROTECT
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-18 18:20 UTC by Warren Togami
Modified: 2012-08-13 08:04 UTC (History)
5 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Warren Togami 2011-01-18 18:20:30 UTC
http://www.uceprotect.net

The owners of UCEPROTECT have requested testing in our weekly masscheck.  I will add this to the weekly masscheck with "tflags nopublish" if there are no objections by this week Friday.

This is NOT a proposal to add this or any particular blacklist to production rules.  I anticipate that tests like this will be only temporary, to be removed in a few weeks if performance is shown to be bad.

I intend these tests to either identify promising blacklists, or (more likely) be able to warn people away from blacklists with actual numbers.
Comment 1 D. Stussy 2011-01-18 21:36:00 UTC
I include these in my local configuration file with scores from 0.3 to 0.1 for the dnsbl-1 to dnsbl-3 lists respectively.  I have not seen many hits.  However, I have seen several hits on UCE-Prot's "backscatterer" list, ips.backscatterer.org.  You may want to add this fourth list to your test.
Comment 2 Henrik Krohns 2011-01-19 01:50:24 UTC
Note that obviously the backscatterer list is pointless on it's own. It should be atleast metat with BOUNCE_MESSAGE if not even more specifically.
Comment 3 Henrik Krohns 2011-01-19 01:54:11 UTC
Then again I'(In reply to comment #2)
> Note that obviously the backscatterer list is pointless on it's own. It should
> be atleast metat with BOUNCE_MESSAGE if not even more specifically.

Then again I'm not even sure of that.. it's really only useful blocking during MTA:

http://www.backscatterer.org/?target=usage

There's not much point in "amplifying" score for already known BOUNCE_MESSAGE. And wasting lookups for that.
Comment 4 D. Stussy 2011-01-19 15:33:54 UTC
...But if one has reached the point where one is scoring a message in SA (having not previously decided to reject it for other reasons), then any element which scores is NOT being used by itself - and therefore, in this specific case, the backscatterer list is not standing on its own but merely contributing to the overall score.

I have found that when it does hit, usually the message would be classified as spam without the list's contributing points.  As for the amplification argument, I disagree:  It is better to have redundant confirmation than to have a failure with the primary determining factors while bypassing the redundancy thus resulting in an incorrect classification.

For Warren Togami:  I sent you the private mail.  The problem you had with your reply bouncing is that gmail does not generate proper "Received:" headers.  They claim "with smtp" but omit the "from" clause - a violation of RFC 5321 and its predecessors (with 821 & 2821 being standard) required syntax.
Comment 5 Warren Togami 2011-01-21 19:03:52 UTC
> For Warren Togami:  I sent you the private mail.  The problem you had with your
> reply bouncing is that gmail does not generate proper "Received:" headers. 
> They claim "with smtp" but omit the "from" clause - a violation of RFC 5321 and
> its predecessors (with 821 & 2821 being standard) required syntax.

BTW, I never heard a response to that mail (reposted to dev@).  Some of your example DNSBL's you sent me are either broken or incorrect.

http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/wtogami/20_ubl.cf?revision=1060616&view=co
For example ubl AFAICT is timing out most of the time, effectively unusable, and when it used to work its performance and FP rate was terrible.

http://www.uceprotect.net/en/index.php?m=6&s=11
Your UCEPROTECT examples had uceprotect.com, but isn't it supposed to be .net?  It appears that .net works here while .com had no hit.
Comment 6 Warren Togami 2011-01-21 21:09:58 UTC
Adding         wtogami/20_bug_6534_uceprotect.cf
Transmitting file data .
Committed revision 1062090.
Comment 7 Darxus 2011-03-01 13:03:43 UTC
"That means the UCEPROTECT-Network will gladfully ignore [the ASRG] and it's decisions or votings from now." - http://www.ietf.org/mail-archive/web/asrg/current/msg16741.html
Today.

Nice of them to announce their intention to ignore the official authority on spam mitigation.
Comment 8 Jason Bertoch 2011-03-01 14:16:37 UTC
I've been following this thread as well and am disappointed in all parties involved.  While I have a hard time defending Claus after the above (and related) threads, I don't think the IETF has any business dictating a revenue model.  Charging for an expedited version of an otherwise free service seems completely rational.

However, with the above pettiness aside, SA is the perfect place for UCEPROTECT in a mail environment.  Their revenue model and policies don't have to become ours any more than we want them to.  In addition, UCEPROTECT's decision to not follow BCP07 is entirely in their right to do so.  It's a BCP, not an RFC.  Not that RFC's are required to be followed, adherence is generally more strict.

In addition, I've had fair success with local UCEPROTECT rules and would like to see how they compare against the general corpus.
Comment 9 Warren Togami 2011-03-01 14:37:59 UTC
I've operated my entire anti-spam career largely ignoring the "authorities", instead attempting to use measurement and statistics to find out for myself.  Whatever authority can say whatever they want about blacklist standards but what matters most is how useful and safe a blacklist is as measured.

That being said...

http://ruleqa.spamassassin.org/20110226-r1074804-n
UCEPROTECT_L1 is one of the worst performing of the DNSBL's were currently testing.

http://www.spamtips.org/2011/01/dnsbl-safety-report-1232011.html
http://ruleqa.spamassassin.org/20110226-r1074804-n/T_RCVD_IN_UCEPROTECT_L1/detail
It has very high overlaps with MSPIKE_BL at 80%, PSBL at 73% and HOSTKARMA_BL at 89%.  Yet despite its similarities with those high safety rated blacklists, 2% of our ham corpus from the past week hit in UCEPROTECT_L1, and this has been pretty consistent for the previous four weeks.

Like the ampr.org fellow above who suggested typo broken Spamassassin rules containing UCEPROTECT, I'm afraid much anti-spam advice out there is made without adequate testing and even cursory examination of the statistics.
Comment 10 D. Stussy 2011-03-01 15:01:54 UTC
I suggested it FOR TESTING.  I simply noted that it was not among the usual DNSBL suspects that SA normally scores.  I didn't say that it was good; only that it was missing.
Comment 11 Warren Togami 2011-03-01 15:06:27 UTC
> I suggested it FOR TESTING.  I simply noted that it was not among the usual
> DNSBL suspects that SA normally scores.  I didn't say that it was good; only
> that it was missing.

I'm sorry for the misunderstanding.  I was somehow under the impression that your example rules were what you used in production (thus tested) and I copied your rules verbatim at first.  This became alarming as UBL is utterly broken at the server and UCEPROTECT had syntax errors.
Comment 12 Kevin A. McGrail 2011-03-31 14:10:02 UTC
(In reply to comment #11)
> > I suggested it FOR TESTING.  I simply noted that it was not among the usual
> > DNSBL suspects that SA normally scores.  I didn't say that it was good; only
> > that it was missing.
> 
> I'm sorry for the misunderstanding.  I was somehow under the impression that
> your example rules were what you used in production (thus tested) and I copied
> your rules verbatim at first.  This became alarming as UBL is utterly broken at
> the server and UCEPROTECT had syntax errors.

I have to say that after running into an IP on UCEPROTECT's powered backscatterer.org, I was shocked to see they charge money for delistings.

I consider this to be unethical.  It's akin to paying to be allowed to Spam and it severely harms their credibility.

Something to be considered.

Regards,
KAM
Comment 13 D. Stussy 2011-03-31 21:15:13 UTC
IMO, charging for [express] delisting is akin to extortion and may be illegal in some jurisdictions around the world.  However, the reason why some IP address is listed in the first place is the demonstration of merit in considering the list.
Comment 14 Kevin A. McGrail 2011-04-01 10:07:23 UTC
(In reply to comment #13)
> IMO, charging for [express] delisting is akin to extortion and may be illegal
> in some jurisdictions around the world.  However, the reason why some IP
> address is listed in the first place is the demonstration of merit in
> considering the list.

True but looking at comment such as Warren's above that this is "one of the worst performing of the DNSBL's were [sic] currently testing." means it fails on technical merit as well as ethical consideration.  I think we can close this ticket as "evaluated" and "failed".  Warren?

Regards,
KAM
Comment 15 Karsten Bräckelmann 2011-04-01 12:39:56 UTC
(In reply to comment #14)
> [...] fails on technical merit as well as ethical consideration.
> I think we can close this ticket as "evaluated" and "failed".

+1
Comment 16 AXB 2011-04-01 13:37:45 UTC
+1
Comment 17 Warren Togami 2011-04-02 01:31:13 UTC
+1 but could we please fix ruleqa first so we can record some final data before it is removed from masscheck?
Comment 18 Darxus 2011-04-12 02:17:36 UTC
Time to close this?

          SPAM%     HAM%     S/O      RANK   SCORE  NAME  
20110319  30.5217   0.4457   0.986    0.73    0.00  T_RCVD_IN_UCEPROTECT_L1
20110409  43.1848   0.6689   0.985    0.70    0.00  T_RCVD_IN_UCEPROTECT_L1
Comment 19 Warren Togami 2011-04-12 04:02:22 UTC
http://ruleqa.spamassassin.org/20110122-r1062119-n/T_RCVD_IN_UCEPROTECT_L1/detail
Results from 1/22/2011
http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_UCEPROTECT_L1/detail
Results from 4/9/2011

* Recent results indicate 1.3% FP rate on the past week of ham, ranging up to 2.5% FP's in recent weeks.  This is worse than our measurements in January.
* 88% spam overlap with RCVD_IN_MSPIKE_BL.  (Most likely the only new DNSBL to be added to spamassassin-3.4.  Look at its stellar spam detection performance with almost zero FP's!)
* 83% spam overlap with RCVD_IN_HOSTKARMA_BL.
* 63% spam overlap with RCVD_IN_PSBL.
* 62% spam overlap with RCVD_IN_PBL.

From multiple months of analysis showing roughly the same thing, it seems clear that UCEPROTECT is a little too unsafe and redundant for spamassassin users.

I suppose we can remove it from masscheck for now.  But I'd like to measure it again every few months to see if it has changed any.  I like being able to definitively warn people against stuff like this with fresh data.
Comment 20 Darxus 2011-05-04 17:19:51 UTC
How about closing this bug and leaving the tests in mass-check?
Comment 21 Kevin A. McGrail 2011-05-04 17:23:51 UTC
(In reply to comment #20)
> How about closing this bug and leaving the tests in mass-check?

Agreed.  The votes are clear and masscheck data can be added when available.  SA won't be implementing UCEPROTECT due to the previous results.
Comment 22 AXB 2012-08-12 15:51:18 UTC
I'd like to propose removal/disabling of these test from sandboxes

( /trunk/rulesrc/sandbox/wtogami/20_bug_6534_uceprotect.cf)

They're set to nopublish (since 2011) and while it's nice to test new BLs, it
puts an unnecessaary load on weekly masschecks to keep them there for such a
long time.

comments, votes please!
Comment 23 Kevin A. McGrail 2012-08-12 15:55:40 UTC
+1 Any BLs that are active in sandbox with no one actively reviewing the results should and really must be disabled.  Net masschecks are taking too long on the weekend!
Comment 24 AXB 2012-08-13 08:04:55 UTC
FTR: Rules commented out on Aug 12 2012