SA Bugzilla – Bug 6864
Excessive score (6.1) from FROM_MISSP_URI, FROM_MISSP_EH_MATCH, TO_NO_BRKTS_FROM_MSSP
Last modified: 2018-03-23 22:02:34 UTC
Today I was investigating a false positive for a mail from our Japanese research colleagues at KEK.jp. It turns out that the following perfectly valid From line caused a collection of 6.1 score points solely because there is no (optional) space between a display name and the address: From: =?ISO-2022-JP?B?VXNlciBTdXBwb3J0IFN5c3RlbS4=?=<usersys1@ml.post.kek.jp> The sample message is attached (obfuscated, stripped-off of irrelevant content, the set of rule hits is kept unchanged). pts rule name description ---- ---------------------- -------------------------------------------------- 0.5 RELAY_JP Relayed through Japan -0.1 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain 1.1 DCC_CHECK Detected as bulk mail by DCC (dcc-servers.net) -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 1.0 FROM_EXCESS_BASE64 From: base64 encoded unnecessarily 1.5 TO_NO_BRKTS_FROM_MSSP Multiple formatting errors 2.5 FROM_MISSP_EH_MATCH From misspaced, matches envelope 2.1 FROM_MISSP_URI From misspaced, has URI -0.2 AWL AWL: From: address is in the auto white-list I think the score for a set of rules FROM_MISSP_EH_MATCH, FROM_MISSP_URI and TO_NO_BRKTS_FROM_MSSP should be capped - a single unusual (but valid) formatting in a From header field should not collect 6.1 points.
Created attachment 5111 [details] sample mail message
(In reply to comment #0) > Today I was investigating a false positive for a mail from our > Japanese research colleagues at KEK.jp. It turns out that the following > perfectly valid From line caused a collection of 6.1 score points > solely because there is no (optional) space between a display name > and the address: Granted it's valid, but it's apparently very common in spam generated by sloppy tools and not common in mail from well-written MUAs. Could you get the X-Mailer from the headers so we know what generated that From: line? That header was not in the spample you attached. Please see if this message (and any other similar FPs) can be added to a masscheck ham corpus so that the generated scores can better reflect this. I will see about tuning the rules; any change I make will probably be rather focused, either to having an encoded display name with no space, or using that specific mailer. I also note the lack of a space after the To: - whatever mailer your colleague is using needs some cleanup. :)
> Granted it's valid, but it's apparently very common in spam generated by > sloppy tools and not common in mail from well-written MUAs. Could you get > the X-Mailer from the headers so we know what generated that From: line? > That header was not in the sample you attached. The attached header is pretty much complete, I just replaced the username with xxx and yyy, simplified Subject, and dropped local Received entries. Unfortunately there was no X-Mailer or User-Agent header field. > I also note the lack of a space after the To: - whatever mailer your > colleague is using needs some cleanup. :) Indeed there was no (optional) space after the To: . The message is a user registration procedure confirmation, apparently generated by some automation scripting or some HR management software. Body is all in ascii plain text, doesn't reveal much of its creation procedure, just states their Users Office contact address, a password and a few instructions. > Please see if this message (and any other similar FPs) can be added to a > masscheck ham corpus so that the generated scores can better reflect this. Not sure if I can reveal more of the message beyond what was attached. > I will see about tuning the rules; any change I make will probably be rather > focused, either to having an encoded display name with no space, or using > that specific mailer. I'm not complaining about each of these three rules scores individually, but their cumulative action seems excessive for a simple unusual formatting.
Changing spamassassin for a single false positive clearly caused by poorly written software seems weird. But I think we could still use some work improving automated handling of overlapping rules.
(In reply to comment #3) > > Please see if this message (and any other similar FPs) can be added to a > > masscheck ham corpus so that the generated scores can better reflect this. > > Not sure if I can reveal more of the message beyond what was attached. What you posted should be sufficient, just ensure it hits all of the FP rules you are concerned about. I notice that the original message hit FROM_MISSP_URI but there's no URI in the sample you provided. For completeness there should be one; if you don't want to leak the real URI then replace it with something innocuous like a link to google or your home page or something.
Created attachment 5112 [details] replaced a sample mail message > What you posted should be sufficient, just ensure it hits > all of the FP rules you are concerned about. > I notice that the original message hit FROM_MISSP_URI but > there's no URI in the sample you provided. For completeness > there should be one; Done - attached.
The rule FROM_MISSP_URI has been disabled in r1441795 and the rule FROM_MISSP_EH_MATCH is not triggered by this sample email any more.