|
SA Bugzilla – Full Text Bug Listing |
Summary: | [review] New Outlook Message-ID format causes FORGED_MUA_OUTLOOK FP | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Joseph Brennan <brennan> |
Component: | Rules | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | sidney |
Priority: | P5 | ||
Version: | 3.2.3 | ||
Target Milestone: | 3.2.5 | ||
Hardware: | Other | ||
OS: | other | ||
Whiteboard: | go | ||
Attachments: |
header portion of message
header file with new Outlook Message-ID Here is my patch to handle the new Outlook Message-ID format Patch to register gated_through_received_hdr_remover as a header eval sub |
Description
Joseph Brennan
2007-10-02 08:26:52 UTC
Created attachment 4140 [details]
header portion of message
Note Message-ID.
It looks like the mail went to hotmail, which set/changed the Message-Id header: Message-ID: <BAYC1-PASMTP13AE1E5C7CC71C6EDEB958CEAE0@CEZ.ICE> Received: from peterfls0fphe8 ([71.113.91.222]) by BAYC1-PASMTP13.bayc1.hotmail.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.2668); Tue, 2 Oct 2007 06:50:25 -0700 Note the "BAYC1-PASMTP" prefix on the Message-ID head and compare to the first host in the Received header. He called himself an MSN user. I know, same thing. Created attachment 4161 [details]
header file with new Outlook Message-ID
Created attachment 4162 [details]
Here is my patch to handle the new Outlook Message-ID format
Here is my first run at a patch to fix this problem.
*** Bug 5767 has been marked as a duplicate of this bug. *** *** Bug 5765 has been marked as a duplicate of this bug. *** Changing summary so this bug will be found when searching for the FP error it causes. Note that comment #6 was a mistake and I reopened the bug after I mistakenly marked it a duplicate of this one. I would prefer to fix this as I described in bug 5765, using \d+[A-Z0-9]{25} or perhaps \d\d[A-Z0-9]{25} in __OE_MSGID_4 instead of [A-Z0-9]{27} as in attachment 4162 [details], so as to be more consistent with __OE_MSGID_3 and to separate the host name from the 25 character UID. But I would not argue the case strongly if anyone disagrees. > I would prefer to fix this as I described in bug 5765, using \d+[A-Z0-9]{25} or > perhaps \d\d[A-Z0-9]{25} in __OE_MSGID_4 instead of [A-Z0-9]{27} as in > attachment 4162 [details] [edit], so as to be more consistent with __OE_MSGID_3 and to separate > the host name from the 25 character UID. But I would not argue the case strongly > if anyone disagrees. +1 (reply to comment #9) > +1 Great, but I still have the problem I posted to the dev list that rule changes I made in my sandbox have not appeared in the rule qa mass check. I don't want to commit a rule change if I'm doing something wrong when checking it. (In reply to comment #10) > (reply to comment #9) > > +1 > > Great, but I still have the problem I posted to the dev list that rule changes I > made in my sandbox have not appeared in the rule qa mass check. I don't want to > commit a rule change if I'm doing something wrong when checking it. yeah, we need to figure that out. It appears your rules are not being used in mass-checks. actually, could you open a *separate* bug for that? it's definitely a code bug somewhere....
> actually, could you open a *separate* bug for that? it's definitely a code
> bug somewhere....
I take that back. see dev list.
Ok, the question about rule qa was resolved on dev list. I received a comment from someone who wanted it posted here anonymously: "Unfortunately, many systems rewrite the message-id (e.g. Netscape/iPlanet/Sun mail server versions), you can't count on it not happening." My responses: 1) Feel free to create a gmail/yahoo/hotmail/spamgourmet/whatever account to post comments here pseudonymously. 2) So far we haven't seen many reports of FPs and have dealt with them one at a time. I would like to see examples of Netscape/iPlanet/Sun mail server and any other Message-IDs. The more we can collect here the more we can handle with a single fix. If we have enough it may be worth creating a separate meta rule whose job is to test for known server-side Message-ID formats. For now I think I will update the rule to address this FP, then look at any more information we get about Message-ID formats. Committed to trunk rules revision 609896 Committed to branch 3.2 rules revision 609898 Committed to 3.2 update channel revision 609899 If anyone provides me with other Message-ID formats used by other servers I'll consider opening a new bug for additional fixes as appropriate. I'm reopening this because I just thought of a possible much better fix. Every example I've seen of this Message-ID format, including those in the attachments here, and the examples I found through Google search such as I linked to in bug 5765, have had a Received header come after the Message-ID header, which trips the MSGID_FROM_MTA_HEADER rule. It seems to me that if the server generates a Message-ID, then the Message-ID obviously cannot come from the MUA, A hit on MSGID_FROM_MTA_HEADER rule should prevent FORGED_MUA_OUTLOOK from triggering. This can be done with meta __FORGED_OE (__OE_MUA && !__OE_MSGID_1 && !__OE_MSGID_2 && !__OE_MSGID_3 && !MSGID_FROM_MTA_HEADER && !__UNUSABLE_MSGID) Some of the examples in bug 4065 do not have a Received header after the Message-ID header, so there is still a need for __OE_MSGID_3, but as far as I have seen this would fix these FPs without having to define __OE_MSGID_4, and it would prevent similar FPs from any other MTA that does something similar to sympatico.ca's servers. The problem I have with changing the rule without further discussion is that the mass-check corpus doesn't seem to have any examples of this. I can show that the rule does no harm, but we don't have example FPs in the corpus. Any thoughts on this? (In reply to comment #15) > Any thoughts on this? it sounds reasonable, at least.... and if the results are equal to or better than the current state of affairs in the nightly mass-check, that's good enough IMO. (in reply to comment #16) I've checked in T_SIDNEY_FORGED_MUA_OUTLOOK_A to my sandbox. If it does no worse than FORGED_MUA_OUTLOOK in the next nightly mass-check I'll commit the new version. Justin, there are two more spam hits in FORGED_MUA_OUTLOOK than in T_SIDNEY_FORGED_MUA_OUTLOOK_A, all apparently recent ones in your spam corpus. Could you check to see what those are? In 20080110-r610722-n, in jm corpus, FORGED_MUA_OUTLOOK hits 583 spams, T_SIDNEY_FORGED_MUA_OUTLOOK_A hits 581 spams Justin, I found the log entries for the two spam hits, but I don't know where to find the original spam emails. The beginning of the two log lines from spam-jm.log are: Y 14 /local/cor/recent/spam/trap.200801080000/063438.20080108 FORGED_MUA_OUTLOOK, Y 12 /local/cor/recent/spam/trap.200801081600/201827.20080108 FORGED_MUA_OUTLOOK, Both of them have MSGID_FROM_MTA_HEADER but also hit FORGED_MUA_OUTLOOK. Can you show me thse spams? (In reply to comment #19) > Justin, I found the log entries for the two spam hits, but I don't know where to > find the original spam emails. The beginning of the two log lines from > spam-jm.log are: > > Y 14 /local/cor/recent/spam/trap.200801080000/063438.20080108 FORGED_MUA_OUTLOOK, > > Y 12 /local/cor/recent/spam/trap.200801081600/201827.20080108 FORGED_MUA_OUTLOOK, > > Both of them have MSGID_FROM_MTA_HEADER but also hit FORGED_MUA_OUTLOOK. > > Can you show me thse spams? sure -- http://taint.org/x/2008/063438.20080108 http://taint.org/x/2008/201827.20080108 (I don't know if you figured it out, but if you hit the "[logs]" link in the per-user freqs table on the rule-detail page of rule-QA, it'll give you the lines from the logs that you're looking for, allowing you to get those paths.) (in reply to comment #20) One of those is spam and the other a widely distributed phishing attempt. This is making me have second thoughts about the reasoning I expressed in comment #15. The problem is that there are two ways that you can end up with MSGID_FROM_MTA_HEADER. One is like some of the sympatico.ca mails where the MTA substitutes its own Message-ID in every mail sent through it. This does not indicate a forged Outlook Express mail even if the Message-Id format is not one generated by Outlook Express. The other way it can happen is when the mail is forging being sent by OE, but does not generate any Message-Id so that the MTA has to. That case does indicate a forged MUA of Outlook Express. The statistics on either of those two situations are just too sparse for me to know which is more likely. After some more looking and thinking and discussing, I've come to a different viewpoint. Looking at the comment in 20_ratware.cf for __UNUSABLE_MSGID, I see > # Dec 17 2002 jm: this means "message ID is either too old or has been > # rewritten by a gateway". Made into an eval test since meta tests cannot > # (yet) chain from other meta tests. > header __UNUSABLE_MSGID eval:check_messageid_not_usable()" It looks like this is meant to be a meta rule that collects the Message-ID formats that are allowed to be exceptions to any Message-ID based forgery rule. Since meta tests _can_ now be chained from other meta tests (such as meta __FORGED_OE being part of the definition of meta FORGED_MUA_OUTLOOK) shouldn't we be able to now make __UNUSABLE_MSGID a meta rule? And then it could encompass the tests we now have in __OE_MSGID_3 and __OE_MSGID_4. Where I went wrong in comment #15 is that I was assuming that FORGED_MUA_OUTLOOK is just testing for the Message-ID being of a format that is produced by OE/OL, and therefore should not hit when the Message-ID is added later by the MTA. That's wrong. We are using whitelist filtering, in which FORGED_MUA_OUTLOOK hits whenever there is an unrecognized Message-ID for any reason except for the known good cases. So how about we replace the eval rule __UNUSABLE_MSGID with a meta of a few header rules including the tests of __OE_MSGID_3 and __OE_MSGID_4? Am I wrong in my assumption that we now can chain meta rules and the restriction in the comment no longer applies? I've checked some test rules into my sandbox to see how it does. To do this I exposed a sub as a header eval rule: svn ci -m "bug 5666: expose sub gated_through_received_hdr_remover() as an eval rule to allow moving the rest of check_messageid_not_usable from eval rules into a meta rule" lib/Mail/SpamAssassin/Plugin/HeaderEval.pm Sending lib/Mail/SpamAssassin/Plugin/HeaderEval.pm Transmitting file data . Committed revision 611510. Justin, This is the only FP for FORGED_MUA_OUTLOOK in the last nightly run. It is in net-jm. Could you look at it and see if it really is ham, and send me a sanitised copy if possible? /local/cor/recent/ham/priv.20070403/10 it's a ham: http://taint.org/x/2008/outlookfp.txt (in reply to comment #25) Do you have any insight as to how the Message-ID in that message got to be the way it is? If it is either Mailman or the server that is handling that mailing list, I would expect that there would be more such FPs on that list. Justin, here are three more spams I need to see -- There's a bug in my rule test. Y 5 /local/cor/recent/spam/low.wall.200801110200/5 Y 5 /local/cor/recent/spam/low.wall.200801112200/19 Y 5 /local/cor/recent/spam/low.wall.200801120200/2 (In reply to comment #26) > (in reply to comment #25) > > Do you have any insight as to how the Message-ID in that message got to be the > way it is? If it is either Mailman or the server that is handling that mailing > list, I would expect that there would be more such FPs on that list. not a clue, I'm afraid. here's those 3 spams: http://taint.org/x/2008/sidneyspams.tgz Here are some things that I came across that could affect some details of these rules. I'm noting them here to make sure that they get documented somewhere. I found a number of emails that hit both __OE_MSGID_2 and __OE_MSGID_4 because sympatico.ca sometimes (but not most of the time) did not remove the original Message-ID when adding a new one. Luckily that doesn't interfere with what these rules are looking for, but I can imagine someone writing rules that assume the existence of only one Message-ID header. Something else I found is that a significant percentage of the __OE_MSGID_3 an __OE_MSGID_4 hits occur without an Outlook or Outlook Express mailer header. That indicates that MTA adds a Message-ID header even when the MUA is not OL/OE. I think that means that the __OE_MSGID_3 and __OE_MSGID_4 tests should be made part of the UNUSABLE_MSGID subrule, not part of an OE-specific rule. They both indicate that an ISP is rewriting the Message-ID and therefore the Message-ID cannot be used as part of a spam-sign. If we do that the names should be changed to something that refers to the ISP and not OE. I'm going to do some more testing with my sandbox to get this right. Created attachment 4252 [details] Patch to register gated_through_received_hdr_remover as a header eval sub I haven't looked at this for a bit, but the sandbox rules are doing the same or a little better than their equivalents, so I'm going to check the changes in to trunk Committed revision 619364. Here are the relevant sandbox results: 8.1972 0.0079 0.999 T_SIDNEY_FORGED_MUA_OUTLOOK 8.1972 0.0091 0.999 FORGED_MUA_OUTLOOK 0.0998 0.0000 1.000 T_SIDNEY__FORGED_OUTLOOK_DOLLARS 0.0998 0.0012 0.988 __FORGED_OUTLOOK_DOLLARS T_SIDNEY_FORGED_MUA_IMS same results as FORGED_MUA_IMS T_SIDNEY_FORGED_MUA_OIMO same results as FORGED_MUA_OIMO T_SIDNEY_FORGED_MUA_EUDORA same results as FORGED_MUA_EUDORA T_SIDNEY_FORGED_MUA_MOZILLA same results as FORGED_MUA_MOZILLA I would just go ahead with adding them to 3.2 rules also, but this requires one code change, so I need a vote. The change is to add one line to register sub gated_through_received_hdr_remover() for use in an eval rule. I have attached the patch for it. Can I have votes for making that change in 3.2 branch? +1 looks good sure, +1 Committed to branch 3.2 revision 619567. I also committed the rule changes that use it the new eval. |