Bug 6864 - Excessive score (6.1) from FROM_MISSP_URI, FROM_MISSP_EH_MATCH, TO_NO_BRKTS_FROM_MSSP
Summary: Excessive score (6.1) from FROM_MISSP_URI, FROM_MISSP_EH_MATCH, TO_NO_BRKTS_F...
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: All All
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-09 16:43 UTC by Mark Martinec
Modified: 2018-03-23 22:02 UTC (History)
3 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
sample mail message text/plain None Mark Martinec [HasCLA]
replaced a sample mail message text/plain None Mark Martinec [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Martinec 2012-11-09 16:43:35 UTC
Today I was investigating a false positive for a mail from our
Japanese research colleagues at KEK.jp. It turns out that the following
perfectly valid From line caused a collection of 6.1 score points
solely because there is no (optional) space between a display name
and the address:

From: =?ISO-2022-JP?B?VXNlciBTdXBwb3J0IFN5c3RlbS4=?=<usersys1@ml.post.kek.jp>

The sample message is attached (obfuscated, stripped-off of irrelevant
content, the set of rule hits is kept unchanged).

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.5 RELAY_JP               Relayed through Japan
-0.1 RP_MATCHES_RCVD        Envelope sender domain matches handover relay domain
 1.1 DCC_CHECK              Detected as bulk mail by DCC (dcc-servers.net)
-1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                            [score: 0.0000]
 1.0 FROM_EXCESS_BASE64     From: base64 encoded unnecessarily
 1.5 TO_NO_BRKTS_FROM_MSSP  Multiple formatting errors
 2.5 FROM_MISSP_EH_MATCH    From misspaced, matches envelope
 2.1 FROM_MISSP_URI         From misspaced, has URI
-0.2 AWL                    AWL: From: address is in the auto white-list

I think the score for a set of rules FROM_MISSP_EH_MATCH, FROM_MISSP_URI
and TO_NO_BRKTS_FROM_MSSP should be capped - a single unusual (but valid)
formatting in a From header field should not collect 6.1 points.
Comment 1 Mark Martinec 2012-11-09 16:44:07 UTC
Created attachment 5111 [details]
sample mail message
Comment 2 John Hardin 2012-11-09 17:26:46 UTC
(In reply to comment #0)
> Today I was investigating a false positive for a mail from our
> Japanese research colleagues at KEK.jp. It turns out that the following
> perfectly valid From line caused a collection of 6.1 score points
> solely because there is no (optional) space between a display name
> and the address:

Granted it's valid, but it's apparently very common in spam generated by sloppy tools and not common in mail from well-written MUAs. Could you get the X-Mailer from the headers so we know what generated that From: line? That header was not in the spample you attached.

Please see if this message (and any other similar FPs) can be added to a masscheck ham corpus so that the generated scores can better reflect this.

I will see about tuning the rules; any change I make will probably be rather focused, either to having an encoded display name with no space, or using that specific mailer.

I also note the lack of a space after the To: - whatever mailer your colleague is using needs some cleanup. :)
Comment 3 Mark Martinec 2012-11-09 18:11:38 UTC
> Granted it's valid, but it's apparently very common in spam generated by
> sloppy tools and not common in mail from well-written MUAs. Could you get
> the X-Mailer from the headers so we know what generated that From: line?
> That header was not in the sample you attached.

The attached header is pretty much complete, I just replaced the username
with xxx and yyy, simplified Subject, and dropped local Received entries.
Unfortunately there was no X-Mailer or User-Agent header field.

> I also note the lack of a space after the To: - whatever mailer your
> colleague is using needs some cleanup. :)

Indeed there was no (optional) space after the To: . The message is a
user registration procedure confirmation, apparently generated by some
automation scripting or some HR management software. Body is all in
ascii plain text, doesn't reveal much of its creation procedure,
just states their Users Office contact address, a password and a
few instructions.

> Please see if this message (and any other similar FPs) can be added to a
> masscheck ham corpus so that the generated scores can better reflect this.

Not sure if I can reveal more of the message beyond what was attached.

> I will see about tuning the rules; any change I make will probably be rather
> focused, either to having an encoded display name with no space, or using
> that specific mailer.

I'm not complaining about each of these three rules scores individually,
but their cumulative action seems excessive for a simple unusual formatting.
Comment 4 Darxus 2012-11-09 18:17:47 UTC
Changing spamassassin for a single false positive clearly caused by poorly written software seems weird.  But I think we could still use some work improving automated handling of overlapping rules.
Comment 5 John Hardin 2012-11-10 18:45:56 UTC
(In reply to comment #3)
> > Please see if this message (and any other similar FPs) can be added to a
> > masscheck ham corpus so that the generated scores can better reflect this.
> 
> Not sure if I can reveal more of the message beyond what was attached.

What you posted should be sufficient, just ensure it hits all of the FP rules you are concerned about.

I notice that the original message hit FROM_MISSP_URI but there's no URI in the sample you provided. For completeness there should be one; if you don't want to leak the real URI then replace it with something innocuous like a link to google or your home page or something.
Comment 6 Mark Martinec 2012-11-11 00:05:23 UTC
Created attachment 5112 [details]
replaced a sample mail message

> What you posted should be sufficient, just ensure it hits
> all of the FP rules you are concerned about.
> I notice that the original message hit FROM_MISSP_URI but
> there's no URI in the sample you provided. For completeness
> there should be one;

Done - attached.
Comment 7 Giovanni Bechis 2018-03-23 22:02:34 UTC
The rule FROM_MISSP_URI has been disabled in r1441795 and the rule FROM_MISSP_EH_MATCH is not triggered by this sample email any more.