Bug 6700 - URI Rule Type appears to fire on headers and it shouldn't...
Summary: URI Rule Type appears to fire on headers and it shouldn't...
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.3.2
Hardware: PC Windows 7
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-14 17:30 UTC by Kevin A. McGrail
Modified: 2013-01-07 20:12 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin A. McGrail 2011-11-14 17:30:17 UTC
Based on http://wiki.apache.org/spamassassin/WritingRules "URI rules are very simple, they only match text in the URI's contained in plain text and HTML sections of mail. This is very handy for searching for links containing spam advertised sites. ", I would not expect a uri rule to fire on the DKIM-Signature header such as:

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mx.aol.com;
        s=20110426; t=1321033079;
        bh=0D8AlXp0FkxaAgPPEcBH3VntUpA5cQjWgCe1Mf3zaww=;
        h=From:To:Subject:Message-ID:Date:MIME-Version:Content-Type;
        b=LrAhdNkRec1Xk7c9YCkhHDk+6UCVklBNZN+R2IKmqNdevatKimK1+ercneP+MLdYy
         741TG6JlldQexs+r9U3N0Ze6GBueIhViPyF1EHN7Ps4lS6amMxVdDELUMpB4h10bAc
         ybXKHA5B/comAfiYOanp7rqgntjWGUD7WwQunoQI=

Perhaps the = is throwing it off?

In short, I don't consider headers part of the plain text or HTML section of the email and uri tests shouldn't be done on headers.

Is this correct behavior?

It appears bug 6317 reference a wish to do exactly the opposite so I believe this is a bug.

Anyone know if this is something to start digging into get_uri_list from PerMsgStatus?
Comment 1 Karsten Bräckelmann 2011-11-14 21:29:56 UTC
This appears to be specific to the DKIM-Signature header. Confirmed uri rules do hit on the d= value.

It does not trigger on arbitrary headers (prepended a Not- to the header name), neither when modifying the data to exhibit the URI on something else than d=.
Comment 2 Karsten Bräckelmann 2011-11-14 21:54:45 UTC
Seems to be deliberate.

See PerMsgStatus.pm get_uri_detail_list(), which deliberately harvests domains from DK/DKIM headers.
Comment 3 Kevin A. McGrail 2011-11-14 22:20:17 UTC
(In reply to comment #2)
> Seems to be deliberate.
> 
> See PerMsgStatus.pm get_uri_detail_list(), which deliberately harvests domains
> from DK/DKIM headers.

Thanks.  Good catch.  That explains the behavior I'm seeing.  This is undocumented and outside the scope of the rule type.  

OK, so going back as far as 3.2.5, the uri rule apparently purposefully also adds the domain from the DKIM/DomainKey Signature header.

So should the documentation be changed or the rule uri parsing?  To me, it seems VERY hackish that the only headers for URI rules are domainkeys/dkim.

Regards,
KAM
Comment 4 Mark Martinec 2011-11-15 01:19:21 UTC
> Thanks.  Good catch.  That explains the behavior I'm seeing.  This is
> undocumented and outside the scope of the rule type.   
> OK, so going back as far as 3.2.5, the uri rule apparently purposefully also
> adds the domain from the DKIM/DomainKey Signature header.
> So should the documentation be changed or the rule uri parsing?  To me, it
> seems VERY hackish that the only headers for URI rules are domainkeys/dkim.

Hackish indeed. No idea what was the rationale. I'd vote for throwing it out.
Comment 5 Karsten Bräckelmann 2011-11-15 02:04:26 UTC
Appears this has been added in revision 484699 and revision 484706. Doesn't reference a bug number or rationale.

Assuming doing so does not break or regress anything, I'd be +1 for removing it, too. According to the docs and my understanding of the harvested URIs, these should not come from headers (other than possibly Subject, which is treated as regular textual part in other places).

Regarding "regression": URI DNSBL rules are using the same harvested URIs. But then again, they don't harvest such headers either.
Comment 6 Karsten Bräckelmann 2011-11-15 02:12:20 UTC
An alternative would be to exclude  $parsed{$dom} == 'domainkeys'  for uri rules, since the source is clearly set.

However, this would at least require some carefully crafted code and possibly re-structuring of that code, to avoid payload URI equal to signing domain slip by. Currently, parsing the body is first, and an identical DKIM parsed domain could remove the URI from the list to check against URI DNSBLs in such an "exclusion" scenario.
Comment 7 Kevin A. McGrail 2013-01-07 20:12:58 UTC
Based on discussion, I removed the code that extracts URI from the DKIM signatures.

Index: lib/Mail/SpamAssassin/PerMsgStatus.pm
===================================================================
--- lib/Mail/SpamAssassin/PerMsgStatus.pm       (revision 1429128)
+++ lib/Mail/SpamAssassin/PerMsgStatus.pm       (working copy)
@@ -1989,14 +1989,19 @@
   # do this so we're sure metadata->html is setup
   my %parsed = map { $_ => 'parsed' } $self->_get_parsed_uri_list();
 
+
+  # This parses of DKIM for URIs disagrees with documentation and bug 6700 votes to disable
+  # this functionality
+  # 2013-01-07
+
   # Look for the domain in DK/DKIM headers
-  my $dk = join(" ", grep {defined} ( $self->get('DomainKey-Signature',undef),
-                                      $self->get('DKIM-Signature',undef) ));
-  while ($dk =~ /\bd\s*=\s*([^;]+)/g) {
-    my $dom = $1;
-    $dom =~ s/\s+//g;
-    $parsed{$dom} = 'domainkeys';
-  }
+  #my $dk = join(" ", grep {defined} ( $self->get('DomainKey-Signature',undef),
+  #                                    $self->get('DKIM-Signature',undef) ));
+  #while ($dk =~ /\bd\s*=\s*([^;]+)/g) {
+  #  my $dom = $1;
+  #  $dom =~ s/\s+//g;
+  #  $parsed{$dom} = 'domainkeys';
+  #}
 
   # get URIs from HTML parsing
   # use the metadata version since $self->{html} may not be setup

svn commit -m 'Disabled parsing of DKIM Header to extract URIs - Bug 6700' lib/Mail/SpamAssassin/PerMsgStatus.pm
Sending        lib/Mail/SpamAssassin/PerMsgStatus.pm
Transmitting file data .
Committed revision 1429988.