Bug 6908 - Forged headers are poisoning AWL database
Summary: Forged headers are poisoning AWL database
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: 3.3 SVN branch
Hardware: PC Linux
: P2 critical
Target Milestone: 3.4.1
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-15 21:19 UTC by David Hill
Modified: 2015-04-06 21:59 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description David Hill 2013-02-15 21:19:31 UTC
There seems to be a bug in which wrong Received From header entry is used for AWL validation and this is poisoning the AWL Database for Linkedin entries (most probably other domains too).

Spamassassin debug output for a sample mail:
Feb 15 20:56:28.321 [12654] dbg: auto-whitelist: tie-ing to DB file of type DB_File R/W in /var/spool/MailScanner/spamassassin/auto-whitelist
Feb 15 20:56:28.321 [12654] dbg: auto-whitelist: IP masking 199.101.160.34 -> 199.101
Feb 15 20:56:28.322 [12654] dbg: auto-whitelist: db-based emailconfirm@linkedin.com|ip=199.101 scores 5/356.94
Feb 15 20:56:28.322 [12654] dbg: auto-whitelist: AWL active, pre-score: 61.453, autolearn score: 61.453, mean: 71.388, IP: 199.101.160.34, address: emailconfirm@linkedin.com (not signed)
Feb 15 20:56:28.322 [12654] dbg: auto-whitelist: add_score: new count: 6, new totscore: 418.393
Feb 15 20:56:28.322 [12654] dbg: auto-whitelist: DB addr list: untie-ing and unlocking
Feb 15 20:56:28.323 [12654] dbg: auto-whitelist: DB addr list: file locked, breaking lock


Original Header:
Received: from cust241-38.148.197.netcabo.co.ao (unknown [197.148.38.241])
        by mail01.ubisoft.com (Postfix) with ESMTP id 53E2A6662F
        for <mzemir@ubisoft.qc.ca>; Fri, 15 Feb 2013 14:11:22 +0000 (GMT)
Received: from maila-cb.linkedin.com ([199.101.160.34]) by mailstore1.secureserver.net;
         Fri, 15 Feb 2013 05:11:22 +0100
Sender: messages-noreply@bounce.linkedin.com
Date: Fri, 15 Feb 2013 05:11:22 +0100
From: LinkedIn Email Confirmation <emailconfirm@linkedin.com>
To: mzemir <mzemir@ubisoft.qc.ca>
Message-ID: <837484192.8705793.8421345197542.JavaMail.app@ela7-app6760.prod>
Subject: [SPAM] Re: Scan from a HP ScanJet  #50553759
MIME-Version: 1.0
Content-Type: multipart/mixed;
        boundary="----=_Part_0120519_7213332464.6743027885770"
X-LinkedIn-Template: email_confirm
X-LinkedIn-Class: ACCT-ADMIN
X-LinkedIn-fbl: s-XAXOQT9M053YRH5XV3YRVB56M4AIKNSWIXM14J-75MNC1CJ74O8J4R
X-OriginalArrivalTime: Fri, 15 Feb 2013 05:11:22 +0100 FILETIME=[6CE9726C:41380A5D]
X-Spam-Prev-Subject: Re: Scan from a HP ScanJet  #50553759


I was hunting a bug in which I deleted linkedin AWL entries for 199.101 IPs and they kept reappearing within minutes with high scores level when all the original linkedin mails score low averages... I've managed to hunt down to these kind of messages that reseted my linkedin AWL entries to high scores , high average.  AWL is setting wrong scores on the wrong emails.
Comment 1 Darxus 2013-02-15 21:23:57 UTC
Can you provide spamassassin's debug output from parsing those headers?

You don't have 197.148.38.241 configured as a trusted / internal relay, right?
Comment 2 David Hill 2013-02-15 21:27:13 UTC
(In reply to comment #1)
> Can you provide spamassassin's debug output from parsing those headers?
> 
> You don't have 197.148.38.241 configured as a trusted / internal relay,
> right?


I made sure it wasn't configured as a trusted / internal relay before creating this bug report as I had issues with that in the past ;)

I'll send the output to the mail I see in the bug report.
Comment 3 David Hill 2013-02-15 21:32:11 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > Can you provide spamassassin's debug output from parsing those headers?
> > 
> > You don't have 197.148.38.241 configured as a trusted / internal relay,
> > right?
> 
> 
> I made sure it wasn't configured as a trusted / internal relay before
> creating this bug report as I had issues with that in the past ;)
> 
> I'll send the output to the mail I see in the bug report.

You should have received it... and you'll see, it's untrusted in the debug  !
Comment 4 Darxus 2013-02-15 21:40:27 UTC
He emailed me the debug output, don't think there's anything more private here than has already been provided:

Feb 15 21:26:07.290 [22499] dbg: config: internal_networks not configured, using trusted_networks configuration for
internal_networks; if you really want internal_networks to only contain the required 127/8 add 'internal_networks !0/0' to your
configuration

Feb 15 21:26:07.291 [22499] dbg: received-header: parsed as [ ip=197.148.38.241 rdns= helo=cust241-38.148.197.netcabo.co.ao
by=mail01.ubisoft.com ident= envfrom= intl=0 id=53E2A6662F auth= msa=0 ]
Feb 15 21:26:07.292 [22499] dbg: received-header: relay 197.148.38.241 trusted? no internal? no msa? no

Feb 15 21:26:07.299 [22499] dbg: received-header: parsed as [ ip=199.101.160.34 rdns= helo=maila-cb.linkedin.com
by=mailstore1.secureserver.net ident= envfrom= intl=0 id= auth= msa=0 ]
Feb 15 21:26:07.299 [22499] dbg: received-header: relay 199.101.160.34 trusted? no internal? no msa? no

Feb 15 21:26:07.300 [22499] dbg: metadata: X-Spam-Relays-Trusted:
Feb 15 21:26:07.300 [22499] dbg: metadata: X-Spam-Relays-Untrusted: [ ip=197.148.38.241 rdns= helo=cust241-38.148.197.netcabo.co.ao by=mail01.ubisoft.com ident= envfrom= intl=0 id=53E2A6662F auth= msa=0 ] [ ip=199.101.160.34 rdns= helo=maila-cb.linkedin.com by=mailstore1.secureserver.net ident= envfrom= intl=0 id= auth= msa=0 ]

Feb 15 21:26:07.300 [22499] dbg: metadata: X-Spam-Relays-Internal:
Feb 15 21:26:07.300 [22499] dbg: metadata: X-Spam-Relays-External: [ ip=197.148.38.241 rdns= helo=cust241-38.148.197.netcabo.co.ao
by=mail01.ubisoft.com ident= envfrom= intl=0 id=53E2A6662F auth= msa=0 ] [ ip=199.101.160.34 rdns= helo=maila-cb.linkedin.com
by=mailstore1.secureserver.net ident= envfrom= intl=0 id= auth= msa=0 ]


So 197.148.38.241, which should be used by AWL, shows up just fine as the last untrusted and last external relay.  It's also the IP used for checking blacklists.  Looks weird to me.
Comment 5 David Hill 2013-02-15 22:02:51 UTC
Hello, 

   I'm not quite sure but if they are in the order they appear in the headers, the reverse could be the culprit... is it?


Dave

<<<SNIP>>>
    foreach my $rly (reverse (@{$pms->{relays_trusted}}, @{$pms->{relays_untrusted}}))
    {
      next if ($rly->{ip_private});
      if ($rly->{ip}) {
        $origip = $rly->{ip}; last;
      }
    }
<<</SNIP>>>


(In reply to comment #4)
> He emailed me the debug output, don't think there's anything more private
> here than has already been provided:
> 
> Feb 15 21:26:07.290 [22499] dbg: config: internal_networks not configured,
> using trusted_networks configuration for
> internal_networks; if you really want internal_networks to only contain the
> required 127/8 add 'internal_networks !0/0' to your
> configuration
> 
> Feb 15 21:26:07.291 [22499] dbg: received-header: parsed as [
> ip=197.148.38.241 rdns= helo=cust241-38.148.197.netcabo.co.ao
> by=mail01.ubisoft.com ident= envfrom= intl=0 id=53E2A6662F auth= msa=0 ]
> Feb 15 21:26:07.292 [22499] dbg: received-header: relay 197.148.38.241
> trusted? no internal? no msa? no
> 
> Feb 15 21:26:07.299 [22499] dbg: received-header: parsed as [
> ip=199.101.160.34 rdns= helo=maila-cb.linkedin.com
> by=mailstore1.secureserver.net ident= envfrom= intl=0 id= auth= msa=0 ]
> Feb 15 21:26:07.299 [22499] dbg: received-header: relay 199.101.160.34
> trusted? no internal? no msa? no
> 
> Feb 15 21:26:07.300 [22499] dbg: metadata: X-Spam-Relays-Trusted:
> Feb 15 21:26:07.300 [22499] dbg: metadata: X-Spam-Relays-Untrusted: [
> ip=197.148.38.241 rdns= helo=cust241-38.148.197.netcabo.co.ao
> by=mail01.ubisoft.com ident= envfrom= intl=0 id=53E2A6662F auth= msa=0 ] [
> ip=199.101.160.34 rdns= helo=maila-cb.linkedin.com
> by=mailstore1.secureserver.net ident= envfrom= intl=0 id= auth= msa=0 ]
> 
> Feb 15 21:26:07.300 [22499] dbg: metadata: X-Spam-Relays-Internal:
> Feb 15 21:26:07.300 [22499] dbg: metadata: X-Spam-Relays-External: [
> ip=197.148.38.241 rdns= helo=cust241-38.148.197.netcabo.co.ao
> by=mail01.ubisoft.com ident= envfrom= intl=0 id=53E2A6662F auth= msa=0 ] [
> ip=199.101.160.34 rdns= helo=maila-cb.linkedin.com
> by=mailstore1.secureserver.net ident= envfrom= intl=0 id= auth= msa=0 ]
> 
> 
> So 197.148.38.241, which should be used by AWL, shows up just fine as the
> last untrusted and last external relay.  It's also the IP used for checking
> blacklists.  Looks weird to me.
Comment 6 David Hill 2013-02-15 22:09:29 UTC
Hello again,

   If I remove the "reverse", it works ... but I don't know what else I broke! 

Feb 15 22:08:36.573 [4173] dbg: auto-whitelist: tie-ing to DB file of type DB_File R/W in /var/spool/MailScanner/spamassassin/auto-whitelist
Feb 15 22:08:36.574 [4173] dbg: auto-whitelist: IP masking 197.148.38.241 -> 197.148
Feb 15 22:08:36.574 [4173] dbg: auto-whitelist: db-based emailconfirm@linkedin.com|ip=197.148 scores 0/0
Feb 15 22:08:36.574 [4173] dbg: auto-whitelist: db-based emailconfirm@linkedin.com|ip=none scores 0/0
Feb 15 22:08:36.574 [4173] dbg: auto-whitelist: AWL active, pre-score: 61.453, autolearn score: 61.453, mean: undef, IP: 197.148.38.241, address: emailconfirm@linkedin.com (not signed)
Feb 15 22:08:36.574 [4173] dbg: auto-whitelist: add_score: new count: 1, new totscore: 61.453
Feb 15 22:08:36.574 [4173] dbg: auto-whitelist: DB addr list: untie-ing and unlocking
Feb 15 22:08:36.575 [4173] dbg: auto-whitelist: DB addr list: file locked, breaking lock



Dave

(In reply to comment #5)
> Hello, 
> 
>    I'm not quite sure but if they are in the order they appear in the
> headers, the reverse could be the culprit... is it?
> 
> 
> Dave
> 
> <<<SNIP>>>
>     foreach my $rly (reverse (@{$pms->{relays_trusted}},
> @{$pms->{relays_untrusted}}))
>     {
>       next if ($rly->{ip_private});
>       if ($rly->{ip}) {
>         $origip = $rly->{ip}; last;
>       }
>     }
> <<</SNIP>>>
> 
> 
> (In reply to comment #4)
> > He emailed me the debug output, don't think there's anything more private
> > here than has already been provided:
> > 
> > Feb 15 21:26:07.290 [22499] dbg: config: internal_networks not configured,
> > using trusted_networks configuration for
> > internal_networks; if you really want internal_networks to only contain the
> > required 127/8 add 'internal_networks !0/0' to your
> > configuration
> > 
> > Feb 15 21:26:07.291 [22499] dbg: received-header: parsed as [
> > ip=197.148.38.241 rdns= helo=cust241-38.148.197.netcabo.co.ao
> > by=mail01.ubisoft.com ident= envfrom= intl=0 id=53E2A6662F auth= msa=0 ]
> > Feb 15 21:26:07.292 [22499] dbg: received-header: relay 197.148.38.241
> > trusted? no internal? no msa? no
> > 
> > Feb 15 21:26:07.299 [22499] dbg: received-header: parsed as [
> > ip=199.101.160.34 rdns= helo=maila-cb.linkedin.com
> > by=mailstore1.secureserver.net ident= envfrom= intl=0 id= auth= msa=0 ]
> > Feb 15 21:26:07.299 [22499] dbg: received-header: relay 199.101.160.34
> > trusted? no internal? no msa? no
> > 
> > Feb 15 21:26:07.300 [22499] dbg: metadata: X-Spam-Relays-Trusted:
> > Feb 15 21:26:07.300 [22499] dbg: metadata: X-Spam-Relays-Untrusted: [
> > ip=197.148.38.241 rdns= helo=cust241-38.148.197.netcabo.co.ao
> > by=mail01.ubisoft.com ident= envfrom= intl=0 id=53E2A6662F auth= msa=0 ] [
> > ip=199.101.160.34 rdns= helo=maila-cb.linkedin.com
> > by=mailstore1.secureserver.net ident= envfrom= intl=0 id= auth= msa=0 ]
> > 
> > Feb 15 21:26:07.300 [22499] dbg: metadata: X-Spam-Relays-Internal:
> > Feb 15 21:26:07.300 [22499] dbg: metadata: X-Spam-Relays-External: [
> > ip=197.148.38.241 rdns= helo=cust241-38.148.197.netcabo.co.ao
> > by=mail01.ubisoft.com ident= envfrom= intl=0 id=53E2A6662F auth= msa=0 ] [
> > ip=199.101.160.34 rdns= helo=maila-cb.linkedin.com
> > by=mailstore1.secureserver.net ident= envfrom= intl=0 id= auth= msa=0 ]
> > 
> > 
> > So 197.148.38.241, which should be used by AWL, shows up just fine as the
> > last untrusted and last external relay.  It's also the IP used for checking
> > blacklists.  Looks weird to me.
Comment 7 David Hill 2013-02-16 02:14:55 UTC
<<<SNIP>>>
AWL base IP address is a way to identify the sender's IP address they frequently send from, in an approximate way, but remaining hard for spammers to spoof. The algorithm is as follows:

  - take the last Received header that contains a public IP address -- namely
    one which is not in private, unrouted IP space.
  - chop off the last two octets, assuming that the user may be in an ISP's
    dynamic address pool.
<<</SNIP>>>



If the last header is spoofed like it's the case, we AWL the wrong IP.
So the code is doing exactly that.   But I'm wondering, why aren't we taking the first hop instead?  Spamming domains should be blacklisted from the internet IMHO and thus, putting it the other way around would naturally discriminate the spammy domains.   They would have to take their spamming issues seriously if they are an ISP ...

Or simply remove AWL because it's now exploitable ... but I "like" my patch !  If hotmail sends me half spam and half ham, theoritically, I would still get my mails.  Don't you think?
Comment 8 Kevin A. McGrail 2013-06-21 14:53:19 UTC
Moving to 3.4.1 target
Comment 9 Ivo Truxa 2014-03-08 02:52:55 UTC
AWL indeed searches the originating IP - in the chain from the top of the header, the last public IP. David is right that it is pointless, because it can be easily spoofed. However, the simple reversing of the array-parsing cannot work correctly, because we would get the first public IP, which is also not what we want. We want the first untrusted IP, and only if that is not available (for example all comes from trusted networks), we take the last trusted IP.

After reviewing this issue, I implemented the IP search algorithm in this way at the TxRep plugin (revision 1.0.5). Look at https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7021 for the details on TxRep (proposed replacement of AWL).

However, still it would be much better for the ranking, to get deeper to the origin through the untrusted relays. One possibility is on the user side - he should assure to define well the trusted networks and all possible relying hosts. Additionally, at low scoring messages, we can assume nobody would try spoofing ham email to improve scoring of a spoofed good address, hence we can trust even the untrusted relays in such case. So when the score is lower than 2.0, TxRep will go through the untrusted relays like AWL always did. This value 2.0 is hardcoded. I wanted to avoid too many settings, but it could be easily done configurable too. Although not perfect, it should help identifying at least the good senders better.

Have a look at the change, test it, and let me know whether it works as intended, and whether it is an acceptable solution.
Comment 10 Kevin A. McGrail 2015-04-06 21:59:56 UTC
Please test TxRep.