SA Bugzilla – Bug 6744
FREEMAIL_REPLYTO False Positives
Last modified: 2011-12-29 20:46:48 UTC
Created attachment 5028 [details] Example email that shows the FP I noticed an email was hammered by the FREEMAIL_REPLYTO. The reason it was hammered is because in the body of the email from a quotation. I've put together a test case email showing the issue. 1 - I think this rule needs to be lowered to 1.0~ instead of 2.775 (for net tests) The S/O is 0.73. http://ruleqa.spamassassin.org/?daterev=20111228-r1225143-n&rule=FREEMAIL_REPLYTO+&srcpath=&g=Change MSECS SPAM% HAM% S/O RANK SCORE NAME WHO/AGE 0 0.9331 2011 of 215512 messages 0.0346 66 of 190519 messages 0.964 0.73 3.26 FREEMAIL_REPLYTO 2 - I wonder if this rule can be split into TWO rules. Right now, it is: Reply-To/From or Reply-To/body contain different freemails Doing the reply-to/body as a separate rule would likely be better on FPs. 3 - The S/O on FREEMAIL_ENVFROM_END_DIGIT worries me as well that the rule should be lowered as well: MSECS SPAM% HAM% S/O RANK SCORE NAME WHO/AGE 0 0.6946 1497 of 215512 messages 0.6089 1160 of 190519 messages 0.533 0.47 0.10 FREEMAIL_ENVFROM_END_DIGIT
Following up: FREEMAIL_REPLYTO_END_DIGIT Reply-To freemail username ends in digit FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit the false positives on these rules (combined 2.3) seem very high as well. The freemail rules in general seem arbitrarily high the more I look into them. it seems that their should be more meta tests or something and perhaps let masscheck suggest some scores for these rules.
try add body addresses into freemail_whitelist or tell senders not to put @ into body, big hint :-)
message-id is yahoo, where is dkim header ?, hope is just a bad example, or does yahoo not dkim sign all mail ? if real life example have dkim, why not whitelist_from sender@example.org ?
(In reply to comment #3) > message-id is yahoo, where is dkim header ?, hope is just a bad example, or > does yahoo not dkim sign all mail ? > > if real life example have dkim, why not whitelist_from sender@example.org ? This is a heavily munged email just to show the key point is that if a yahoo user emails with quoting another yahoo sender, then the rule hits with a 2.775 score, for example. Whitelisting is not appropriate. I think the FREEMAIL rules might have a lot of likely FPs.
so masscheck scores must not give fp even if whitelistning is possible ? fp is imho sign of missing corpus
(In reply to comment #1) > Following up: > > FREEMAIL_REPLYTO_END_DIGIT Reply-To freemail username ends in digit > FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit > > the false positives on these rules (combined 2.3) seem very high as well. The > freemail rules in general seem arbitrarily high the more I look into them. > > it seems that their should be more meta tests or something and perhaps let > masscheck suggest some scores for these rules. Agreed, given the nature of freemail services.
(In reply to comment #5) > so masscheck scores must not give fp even if whitelistning is possible ? The scores currently in place are hard coded overrides of masscheck. To me, whitelisting and blacklisting is a worst-case scenarios and algorithms should be the focus. > fp is imho sign of missing corpus I don't know what you mean as the scores and these rules are being forced active. I am not 100% certain but I question if they would be promoted or not.
I have also seen what appears to be a false positive based on someone making an inline-forward of a message that had a cc: to another freemail account. I asked the customer for a better spample about a week ago and have not yet received it, so I had not raised it as a bug. On the other hand, FREEMAIL_REPLYTO has been a very fruitful rule for us in production, especially in metas. I'd hate to see it go away completely. The rule could probably be improved by excluding mail addresses that appear to be in a forwarded message header - looking for To: or CC: near the beginning of a body line with Subject: within a few lines beyond.
According to bug 6394, this is not the first time this has been discussed: Here are the current scores for FREEMAIL: #FREEMAIL SCORES score FREEMAIL_FORGED_REPLYTO 1.199 2.503 1.204 2.095 score FREEMAIL_REPLY 2.499 2.499 1.788 1.929 score FREEMAIL_REPLYTO 3.257 2.775 1.811 2.398 score FREEMAIL_REPLYTO_END_DIGIT 1.221 0.980 1.179 1.151 # Bug 6394, score 1.5 is too high, depends on local traffic score FREEMAIL_ENVFROM_END_DIGIT 0.1 score FREEMAIL_FROM 0.001 I'm making the following changes but open to others tweaking the plugin, tweaking the scores or even disabling the rules and putting them into masscheck for auto-promotion (or not as the case may be). #FREEMAIL SCORES - Scores lowered per bug 6744 score FREEMAIL_FORGED_REPLYTO 1.199 2.503 1.204 2.095 score FREEMAIL_REPLY 1.0 score FREEMAIL_REPLYTO 1.0 score FREEMAIL_REPLYTO_END_DIGIT 0.25 score FREEMAIL_ENVFROM_END_DIGIT 0.25 score FREEMAIL_FROM 0.001 In short, the above scores are possibly just a band-aid but they should mitigates an issue we know is occurring. And, for example, the FP I mentioned at the start was completely legit but fired 5.1 points JUST from the Freemail rules. That's almost a poison pill rule. I think these could be improved with meta rules but someone would need to pick up that baton. I want to focus on 3.4.0 blockers. svn commit -m 'Grouping Freemail scores and lowering then substantially per bug 6744' Sending rules/50_scores.cf Transmitting file data . Committed revision 1225646.