6744 – FREEMAIL_REPLYTO False Positives

Bug 6744 - FREEMAIL_REPLYTO False Positives

Summary: FREEMAIL_REPLYTO False Positives

Status:	RESOLVED FIXED

Alias:	None

Product:	Spamassassin
Classification:	Unclassified
Component:	Rules (show other bugs)
Version:	SVN Trunk (Latest Devel Version)
Hardware:	PC Windows 7

Importance:	P2 normal
Target Milestone:	Undefined
Assignee:	SpamAssassin Developer Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-12-29 17:17 UTC by Kevin A. McGrail
Modified:	2011-12-29 20:46 UTC (History)
CC List:	4 users (show)

Attachment	Type	Modified	Status	Actions	Submitter/CLA Status
Example email that shows the FP	text/plain			None	Kevin A. McGrail
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kevin A. McGrail 2011-12-29 17:17:06 UTC

Created attachment 5028 [details]
Example email that shows the FP

I noticed an email was hammered by the FREEMAIL_REPLYTO.

The reason it was hammered is because in the body of the email from a quotation.  I've put together a test case email showing the issue.

1 - I think this rule needs to be lowered to 1.0~ instead of 2.775 (for net tests)

The S/O is 0.73.

http://ruleqa.spamassassin.org/?daterev=20111228-r1225143-n&rule=FREEMAIL_REPLYTO+&srcpath=&g=Change

MSECS    	SPAM%    	HAM%    	S/O    	RANK    	SCORE    	NAME    	WHO/AGE   
0 	0.9331 2011 of 215512 messages 	0.0346 66 of 190519 messages 	0.964 	0.73 	3.26 	FREEMAIL_REPLYTO 	


2 - I wonder if this rule can be split into TWO rules. Right now, it is:

Reply-To/From or Reply-To/body contain different freemails

Doing the reply-to/body as a separate rule would likely be better on FPs.


3 - The S/O on FREEMAIL_ENVFROM_END_DIGIT worries me as well that the rule should be lowered as well:

MSECS    	SPAM%    	HAM%    	S/O    	RANK    	SCORE    	NAME    	WHO/AGE   
0 	0.6946 1497 of 215512 messages 	0.6089 1160 of 190519 messages 	0.533 	0.47 	0.10 	FREEMAIL_ENVFROM_END_DIGIT

Comment 1 Kevin A. McGrail 2011-12-29 17:26:19 UTC

Following up:

FREEMAIL_REPLYTO_END_DIGIT Reply-To freemail username ends in digit
FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit

the false positives on these rules (combined 2.3) seem very high as well.  The freemail rules in general seem arbitrarily high the more I look into them.

it seems that their should be more meta tests or something and perhaps let masscheck suggest some scores for these rules.

Comment 2 Benny Pedersen 2011-12-29 18:04:00 UTC

try add body addresses into freemail_whitelist or tell senders not to put @ into body, big hint :-)

Comment 3 Benny Pedersen 2011-12-29 18:09:40 UTC

message-id is yahoo, where is dkim header ?, hope is just a bad example, or does yahoo not dkim sign all mail ?

if real life example have dkim, why not whitelist_from sender@example.org ?

Comment 4 Kevin A. McGrail 2011-12-29 18:13:05 UTC

(In reply to comment #3)
> message-id is yahoo, where is dkim header ?, hope is just a bad example, or
> does yahoo not dkim sign all mail ?
> 
> if real life example have dkim, why not whitelist_from sender@example.org ?

This is a heavily munged email just to show the key point is that if a yahoo user emails with quoting another yahoo sender, then the rule hits with a 2.775 score, for example. 

Whitelisting is not appropriate.  I think the FREEMAIL rules might have a lot of likely FPs.

Comment 5 Benny Pedersen 2011-12-29 18:26:25 UTC

so masscheck scores must not give fp even if whitelistning is possible ?

fp is imho sign of missing corpus

Comment 6 John Hardin 2011-12-29 18:29:35 UTC

(In reply to comment #1)
> Following up:
> 
> FREEMAIL_REPLYTO_END_DIGIT Reply-To freemail username ends in digit
> FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit
> 
> the false positives on these rules (combined 2.3) seem very high as well.  The
> freemail rules in general seem arbitrarily high the more I look into them.
> 
> it seems that their should be more meta tests or something and perhaps let
> masscheck suggest some scores for these rules.

Agreed, given the nature of freemail services.

Comment 7 Kevin A. McGrail 2011-12-29 20:13:03 UTC

(In reply to comment #5)
> so masscheck scores must not give fp even if whitelistning is possible ?

The scores currently in place are hard coded overrides of masscheck.  

To me, whitelisting and blacklisting is a worst-case scenarios and algorithms should be the focus.

> fp is imho sign of missing corpus

I don't know what you mean as the scores and these rules are being forced active.  I am not 100% certain but I question if they would be promoted or not.

Comment 8 Daniel J McDonald 2011-12-29 20:24:25 UTC

I have also seen what appears to be a false positive based on someone making an inline-forward of a message that had a cc: to another freemail account.  I asked the customer for a better spample about a week ago and have not yet received it, so I had not raised it as a bug.

On the other hand, FREEMAIL_REPLYTO has been a very fruitful rule for us in production, especially in metas.  I'd hate to see it go away completely.

The rule could probably be improved by excluding mail addresses that appear to be in a forwarded message header - looking for To: or CC: near the beginning of a body line with Subject: within a few lines beyond.

Comment 9 Kevin A. McGrail 2011-12-29 20:46:48 UTC

According to bug 6394, this is not the first time this has been discussed:

Here are the current scores for FREEMAIL:

#FREEMAIL SCORES
score FREEMAIL_FORGED_REPLYTO 1.199 2.503 1.204 2.095
score FREEMAIL_REPLY 2.499 2.499 1.788 1.929
score FREEMAIL_REPLYTO 3.257 2.775 1.811 2.398
score FREEMAIL_REPLYTO_END_DIGIT 1.221 0.980 1.179 1.151
# Bug 6394, score 1.5 is too high, depends on local traffic
score FREEMAIL_ENVFROM_END_DIGIT 0.1
score FREEMAIL_FROM     0.001


I'm making the following changes but open to others tweaking the plugin, tweaking the scores or even disabling the rules and putting them into masscheck for auto-promotion (or not as the case may be).

#FREEMAIL SCORES - Scores lowered per bug 6744
score FREEMAIL_FORGED_REPLYTO 1.199 2.503 1.204 2.095
score FREEMAIL_REPLY 1.0
score FREEMAIL_REPLYTO 1.0
score FREEMAIL_REPLYTO_END_DIGIT 0.25
score FREEMAIL_ENVFROM_END_DIGIT 0.25
score FREEMAIL_FROM     0.001

In short, the above scores are possibly just a band-aid but they should mitigates an issue we know is occurring.  And, for example, the FP I mentioned at the start was completely legit but fired 5.1 points JUST from the Freemail rules.  That's almost a poison pill rule. 

I think these could be improved with meta rules but someone would need to pick up that baton.  I want to focus on 3.4.0 blockers.

svn commit -m 'Grouping Freemail scores and lowering then substantially per bug 6744'
Sending        rules/50_scores.cf
Transmitting file data .
Committed revision 1225646.