Bug 5666

Summary: [review] New Outlook Message-ID format causes FORGED_MUA_OUTLOOK FP
Product: Spamassassin Reporter: Joseph Brennan <brennan>
Component: RulesAssignee: SpamAssassin Developer Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: sidney
Priority: P5    
Version: 3.2.3   
Target Milestone: 3.2.5   
Hardware: Other   
OS: other   
Whiteboard: go
Attachments: header portion of message
header file with new Outlook Message-ID
Here is my patch to handle the new Outlook Message-ID format
Patch to register gated_through_received_hdr_remover as a header eval sub

Description Joseph Brennan 2007-10-02 08:26:52 UTC
New Outlook Message-id format seen.  FORGED_MUA_OUTLOOK matches a real Outlook message.
Comment 1 Joseph Brennan 2007-10-02 08:30:59 UTC
Created attachment 4140 [details]
header portion of message

Note Message-ID.
Comment 2 Theo Van Dinter 2007-10-02 08:36:15 UTC
It looks like the mail went to hotmail, which set/changed the Message-Id header:

Message-ID: <BAYC1-PASMTP13AE1E5C7CC71C6EDEB958CEAE0@CEZ.ICE>
Received: from peterfls0fphe8 ([71.113.91.222]) by
BAYC1-PASMTP13.bayc1.hotmail.com over TLS secured channel with Microsoft
SMTPSVC(6.0.3790.2668);
	 Tue, 2 Oct 2007 06:50:25 -0700

Note the "BAYC1-PASMTP" prefix on the Message-ID head and compare to the first
host in the Received header.
Comment 3 Joseph Brennan 2007-10-02 08:48:28 UTC
He called himself an MSN user.  I know, same thing.
Comment 4 Paul Griffith 2007-10-16 06:30:31 UTC
Created attachment 4161 [details]
header file with new Outlook Message-ID
Comment 5 Paul Griffith 2007-10-16 09:44:36 UTC
Created attachment 4162 [details]
Here is my patch to handle the new Outlook Message-ID format

Here is my first run at a patch to fix this problem.
Comment 6 Sidney Markowitz 2008-01-07 11:54:31 UTC
*** Bug 5767 has been marked as a duplicate of this bug. ***
Comment 7 Sidney Markowitz 2008-01-07 11:57:33 UTC
*** Bug 5765 has been marked as a duplicate of this bug. ***
Comment 8 Sidney Markowitz 2008-01-07 12:18:10 UTC
Changing summary so this bug will be found when searching for the FP error it
causes.

 Note that comment #6 was a mistake and I reopened the bug after I mistakenly
marked it a duplicate of this one.

I would prefer to fix this as I described in bug 5765, using \d+[A-Z0-9]{25} or
perhaps \d\d[A-Z0-9]{25} in __OE_MSGID_4 instead of [A-Z0-9]{27} as in
attachment 4162 [details], so as to be more consistent with __OE_MSGID_3 and to separate
the host name from the 25 character UID. But I would not argue the case strongly
if anyone disagrees.
Comment 9 Justin Mason 2008-01-07 12:25:24 UTC
> I would prefer to fix this as I described in bug 5765, using \d+[A-Z0-9]{25} or
> perhaps \d\d[A-Z0-9]{25} in __OE_MSGID_4 instead of [A-Z0-9]{27} as in
> attachment 4162 [details] [edit], so as to be more consistent with __OE_MSGID_3 and to
separate
> the host name from the 25 character UID. But I would not argue the case strongly
> if anyone disagrees.

+1
Comment 10 Sidney Markowitz 2008-01-07 13:14:16 UTC
(reply to comment #9)
> +1

Great, but I still have the problem I posted to the dev list that rule changes I
made in my sandbox have not appeared in the rule qa mass check. I don't want to
commit a rule change if I'm doing something wrong when checking it.
Comment 11 Justin Mason 2008-01-07 14:41:17 UTC
(In reply to comment #10)
> (reply to comment #9)
> > +1
> 
> Great, but I still have the problem I posted to the dev list that rule changes I
> made in my sandbox have not appeared in the rule qa mass check. I don't want to
> commit a rule change if I'm doing something wrong when checking it.

yeah, we need to figure that out.  It appears your rules are not being used
in mass-checks.

actually, could you open a *separate* bug for that?  it's definitely a code
bug somewhere....
Comment 12 Justin Mason 2008-01-07 15:29:23 UTC
> actually, could you open a *separate* bug for that?  it's definitely a code
> bug somewhere....

I take that back.  see dev list.
Comment 13 Sidney Markowitz 2008-01-07 16:02:14 UTC
Ok, the question about rule qa was resolved on dev list.

I received a comment from someone who wanted it posted here anonymously:

"Unfortunately, many systems rewrite the message-id (e.g. Netscape/iPlanet/Sun
mail server versions), you can't count on it not happening."

My responses: 1) Feel free to create a gmail/yahoo/hotmail/spamgourmet/whatever
account to post comments here pseudonymously. 2) So far we haven't seen many
reports of FPs and have dealt with them one at a time. I would like to see
examples of Netscape/iPlanet/Sun mail server and any other Message-IDs. The more
we can collect here the more we can handle with a single fix. If we have enough
it may be worth creating a separate meta rule whose job is to test for known
server-side Message-ID formats.

For now I think I will update the rule to address this FP, then look at any more
information we get about Message-ID formats.
Comment 14 Sidney Markowitz 2008-01-07 23:49:15 UTC
Committed to trunk rules revision 609896
Committed to branch 3.2 rules revision 609898
Committed to 3.2 update channel revision 609899

If anyone provides me with other Message-ID formats used by other servers I'll
consider opening a new bug for additional fixes as appropriate.
Comment 15 Sidney Markowitz 2008-01-09 05:12:44 UTC
I'm reopening this because I just thought of a possible much better fix.

Every example I've seen of this Message-ID format, including those in the
attachments here, and the examples I found through Google search such as I
linked to in bug 5765, have had a Received header come after the Message-ID
header, which trips the MSGID_FROM_MTA_HEADER rule. It seems to me that if the
server generates a Message-ID, then the Message-ID obviously cannot come from
the MUA, A hit on MSGID_FROM_MTA_HEADER rule should prevent FORGED_MUA_OUTLOOK
from triggering. This can be done with

meta __FORGED_OE  (__OE_MUA && !__OE_MSGID_1 && !__OE_MSGID_2 && !__OE_MSGID_3
&& !MSGID_FROM_MTA_HEADER  && !__UNUSABLE_MSGID)

Some of the examples in bug 4065 do not have a Received header after the
Message-ID header, so there is still a need for __OE_MSGID_3, but as far as I
have seen this would fix these FPs without having to define __OE_MSGID_4, and it
would prevent similar FPs from any other MTA that does something similar to
sympatico.ca's servers.

The problem I have with changing the rule without further discussion is that the
 mass-check corpus doesn't seem to have any examples of this. I can show that
the rule does no harm, but we don't have example FPs in the corpus.

Any thoughts on this?
Comment 16 Justin Mason 2008-01-09 05:50:32 UTC
(In reply to comment #15)
> Any thoughts on this?

it sounds reasonable, at least.... and if the results are equal to or better
than the current state of affairs in the nightly mass-check, that's good enough IMO.
Comment 17 Sidney Markowitz 2008-01-09 06:52:15 UTC
(in reply to comment #16)
I've checked in T_SIDNEY_FORGED_MUA_OUTLOOK_A to my sandbox.

If it does no worse than FORGED_MUA_OUTLOOK in the next nightly mass-check I'll
commit the new version.
Comment 18 Sidney Markowitz 2008-01-10 11:39:59 UTC
Justin, there are two more spam hits in FORGED_MUA_OUTLOOK than in
T_SIDNEY_FORGED_MUA_OUTLOOK_A, all apparently recent ones in your spam corpus.
Could you check to see what those are?

In 20080110-r610722-n, in jm corpus,
FORGED_MUA_OUTLOOK hits 583 spams,
T_SIDNEY_FORGED_MUA_OUTLOOK_A hits 581 spams
Comment 19 Sidney Markowitz 2008-01-10 23:34:34 UTC
Justin, I found the log entries for the two spam hits, but I don't know where to
find the original spam emails. The beginning of the two log lines from
spam-jm.log are:

Y 14 /local/cor/recent/spam/trap.200801080000/063438.20080108 FORGED_MUA_OUTLOOK,

Y 12 /local/cor/recent/spam/trap.200801081600/201827.20080108 FORGED_MUA_OUTLOOK,

Both of them have MSGID_FROM_MTA_HEADER but also hit FORGED_MUA_OUTLOOK.

Can you show me thse spams?
Comment 20 Justin Mason 2008-01-11 01:07:05 UTC
(In reply to comment #19)
> Justin, I found the log entries for the two spam hits, but I don't know where to
> find the original spam emails. The beginning of the two log lines from
> spam-jm.log are:
> 
> Y 14 /local/cor/recent/spam/trap.200801080000/063438.20080108 FORGED_MUA_OUTLOOK,
> 
> Y 12 /local/cor/recent/spam/trap.200801081600/201827.20080108 FORGED_MUA_OUTLOOK,
> 
> Both of them have MSGID_FROM_MTA_HEADER but also hit FORGED_MUA_OUTLOOK.
> 
> Can you show me thse spams?

sure --

http://taint.org/x/2008/063438.20080108
http://taint.org/x/2008/201827.20080108

(I don't know if you figured it out, but if you hit the "[logs]" link in
the per-user freqs table on the rule-detail page of rule-QA, it'll
give you the lines from the logs that you're looking for, allowing you
to get those paths.)
Comment 21 Sidney Markowitz 2008-01-11 03:03:04 UTC
(in reply to comment #20)

One of those is spam and the other a widely distributed phishing attempt. This
is making me have second thoughts about the reasoning I expressed in comment
#15. The problem is that there are two ways that you can end up with
MSGID_FROM_MTA_HEADER. One is like some of the sympatico.ca mails where the MTA
substitutes its own Message-ID in every mail sent through it. This does not
indicate a forged Outlook Express mail even if the Message-Id format is not one
generated by Outlook Express. The other way it can happen is when the mail is
forging being sent by OE, but does not generate any Message-Id so that the MTA
has to. That case does indicate a forged MUA of Outlook Express.

The statistics on either of those two situations are just too sparse for me to
know which is more likely.
Comment 22 Sidney Markowitz 2008-01-12 09:40:50 UTC
After some more looking and thinking and discussing, I've come to a different
viewpoint. Looking at the comment in 20_ratware.cf for __UNUSABLE_MSGID, I see

> # Dec 17 2002 jm: this means "message ID is either too old or has been       
                                                                               
     
> # rewritten by a gateway".  Made into an eval test since meta tests cannot   
                                                                               
     
> # (yet) chain from other meta tests.                                         
                                                                               
     
> header __UNUSABLE_MSGID   eval:check_messageid_not_usable()"

It looks like this is meant to be a meta rule that collects the Message-ID
formats that are allowed to be exceptions to any Message-ID based forgery rule.
Since meta tests _can_ now be chained from other meta tests (such as meta
__FORGED_OE being part of the definition of meta FORGED_MUA_OUTLOOK) shouldn't
we be able to now make __UNUSABLE_MSGID a meta rule? And then it could encompass
the tests we now have in __OE_MSGID_3 and __OE_MSGID_4.

Where I went wrong in comment #15 is that I was assuming that FORGED_MUA_OUTLOOK
is just testing for the Message-ID being of a format that is produced by OE/OL,
and therefore should not hit when the Message-ID is added later by the MTA.
That's wrong. We are using whitelist filtering, in which FORGED_MUA_OUTLOOK hits
whenever there is an unrecognized Message-ID for any reason except for the known
good cases.

So how about we replace the eval rule __UNUSABLE_MSGID with a meta of a few
header rules including the tests of __OE_MSGID_3 and __OE_MSGID_4? Am I wrong in
my assumption that we now can chain meta rules and the restriction in the
comment no longer applies?
Comment 23 Sidney Markowitz 2008-01-12 17:06:18 UTC
I've checked some test rules into my sandbox to see how it does. To do this I
exposed a sub as a header eval rule:

svn ci -m "bug 5666: expose sub gated_through_received_hdr_remover() as an eval
rule to allow moving the rest of check_messageid_not_usable from eval rules into
a meta rule" lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
Sending        lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
Transmitting file data .
Committed revision 611510.
Comment 24 Sidney Markowitz 2008-01-13 03:25:53 UTC
Justin,

This is the only FP for FORGED_MUA_OUTLOOK in the last nightly run. It is in
net-jm. Could you look at it and see if it really is ham, and send me a
sanitised copy if possible?

/local/cor/recent/ham/priv.20070403/10
 
Comment 25 Justin Mason 2008-01-13 05:56:16 UTC
it's a ham: http://taint.org/x/2008/outlookfp.txt
Comment 26 Sidney Markowitz 2008-01-13 10:42:57 UTC
(in reply to comment #25)

Do you have any insight as to how the Message-ID in that message got to be the
way it is? If it is either Mailman or the server that is handling that mailing
list, I would expect that there would be more such FPs on that list.
Comment 27 Sidney Markowitz 2008-01-13 10:57:03 UTC
Justin, here are three more spams I need to see -- There's a bug in my rule test.

Y  5 /local/cor/recent/spam/low.wall.200801110200/5
Y  5 /local/cor/recent/spam/low.wall.200801112200/19
Y  5 /local/cor/recent/spam/low.wall.200801120200/2

Comment 28 Justin Mason 2008-01-13 12:34:49 UTC
(In reply to comment #26)
> (in reply to comment #25)
> 
> Do you have any insight as to how the Message-ID in that message got to be the
> way it is? If it is either Mailman or the server that is handling that mailing
> list, I would expect that there would be more such FPs on that list.

not a clue, I'm afraid.

here's those 3 spams:

http://taint.org/x/2008/sidneyspams.tgz
Comment 29 Sidney Markowitz 2008-01-15 13:01:37 UTC
Here are some things that I came across that could affect some details of these
rules. I'm noting them here to make sure that they get documented somewhere.

I found a number of emails that hit both __OE_MSGID_2 and __OE_MSGID_4 because
sympatico.ca sometimes (but not most of the time) did not remove the original
Message-ID when adding a new one. Luckily that doesn't interfere with what these
rules are looking for, but I can imagine someone writing rules that assume the
existence of only one Message-ID header.

Something else I found is that a significant percentage of the __OE_MSGID_3 an
__OE_MSGID_4 hits occur without an Outlook or Outlook Express mailer header.
That indicates that MTA adds a Message-ID header even when the MUA is not OL/OE.
I think that means that the __OE_MSGID_3 and __OE_MSGID_4 tests should be made
part of the UNUSABLE_MSGID subrule, not part of an OE-specific rule. They both
indicate that an ISP is rewriting the Message-ID and therefore the Message-ID
cannot be used as part of a spam-sign. If we do that the names should be changed
to something that refers to the ISP and not OE.

I'm going to do some more testing with my sandbox to get this right.
Comment 30 Sidney Markowitz 2008-02-07 03:35:30 UTC
Created attachment 4252 [details]
Patch to register gated_through_received_hdr_remover as a header eval sub

I haven't looked at this for a bit, but the sandbox rules are doing the same or
a little better than their equivalents, so I'm going to check the changes in to
trunk

Committed revision 619364.

Here are the relevant sandbox results:

8.1972	0.0079	0.999  T_SIDNEY_FORGED_MUA_OUTLOOK
8.1972	0.0091	0.999  FORGED_MUA_OUTLOOK		

0.0998	0.0000	1.000  T_SIDNEY__FORGED_OUTLOOK_DOLLARS 		
0.0998	0.0012	0.988  __FORGED_OUTLOOK_DOLLARS

T_SIDNEY_FORGED_MUA_IMS same results as FORGED_MUA_IMS
T_SIDNEY_FORGED_MUA_OIMO same results as FORGED_MUA_OIMO
T_SIDNEY_FORGED_MUA_EUDORA same results as FORGED_MUA_EUDORA
T_SIDNEY_FORGED_MUA_MOZILLA same results as FORGED_MUA_MOZILLA

I would just go ahead with adding them to 3.2 rules also, but this requires one
code change, so I need a vote. The change is to add one line to register sub
gated_through_received_hdr_remover() for use in an eval rule. I have attached
the patch for it.

Can I have votes for making that change in 3.2 branch?
Comment 31 Justin Mason 2008-02-07 03:44:39 UTC
+1 looks good
Comment 32 Daryl C. W. O'Shea 2008-02-07 09:32:37 UTC
sure, +1
Comment 33 Sidney Markowitz 2008-02-07 10:57:36 UTC
Committed to branch 3.2 revision 619567.

I also committed the rule changes that use it the new eval.