Bug 6657 - can't have both header existence test and header field modifier, e.g. exists:References:addr
Summary: can't have both header existence test and header field modifier, e.g. exists:...
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.3 SVN branch
Hardware: All All
: P2 minor
Target Milestone: 3.4.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-07 03:57 UTC by Adam Katz
Modified: 2012-01-17 18:25 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Katz 2011-09-07 03:57:08 UTC
The following example rule is invalid:

header  __HAS_REFERENCES_ADDR  exists:References:addr


It generates:

lint: config: invalid regexp for rule __T_AK_HAS_REFERENCES_ADDR: : missing or invalid delimiters


I expected it to be identical to:

header  __HAS_REFERENCES_ADDR  References:addr =~ /^\S/


If this functionality is undesired, the documentation should forbid it.

(This is not a regression afaict.)
Comment 1 Karsten Bräckelmann 2011-09-07 17:24:30 UTC
Reading the docs again, I'm not even positive it hints this would be supported.

The exists: documentation states to test the header for existence -- the header itself. Clearly, even after stripping anything but the first email address as :addr does, this would not have an impact on the header's existence...
Comment 2 Karsten Bräckelmann 2011-09-07 18:31:50 UTC
Actually, the docs only specifically mention exists:Header and Header:addr as special header tests, with no word suggesting Header might be anything more complicated than a plain mail header.

The :addr one might be not that obvious, but exists:name_of_header specifically mentions a header name only.

IMHO, this does not need to be supported, since exists: merely is "a very simple version of the above header tests" (quoting the docs) and can easily be done with an RE rule. Moreover, I believe the docs do not specifically need to forbid this, since it doesn't suggest this to be supported.

Not a bug IMHO, closing RESOLVED INVALID accordingly. Please feel free to re-open the bug report, if you feel strongly about it.
Comment 3 Adam Katz 2011-09-07 18:52:09 UTC
(In reply to comment #2)
> Actually, the docs only specifically mention exists:Header and Header:addr as
> special header tests, with no word suggesting Header might be anything more
> complicated than a plain mail header.
> 
> The :addr one might be not that obvious, but exists:name_of_header specifically
> mentions a header name only.
> 
> IMHO, this does not need to be supported, since exists: merely is "a very
> simple version of the above header tests" (quoting the docs) and can easily be
> done with an RE rule. Moreover, I believe the docs do not specifically need to
> forbid this, since it doesn't suggest this to be supported.
> 
> Not a bug IMHO, closing RESOLVED INVALID accordingly. Please feel free to
> re-open the bug report, if you feel strongly about it.

I worry that "invalid regexp" and "missing or invalid delimiters" might be too vague for troubleshooting this issue.

Also, I think the docs are indeed ambiguous.

First, it is unclear if a blank header will match an exists: test.  Second, :addr is described as modifying the header for the current rule:

> Appending a modifier ":addr" to a header field name will cause everything
> except the first email address to be removed from the header field.

Assuming exists: does not match an empty (but still present) header, this implies that a header that does not contain an address is empty and therefore does not match exists: even if it has other data.
Comment 4 Mark Martinec 2011-09-08 00:02:36 UTC
> I worry that "invalid regexp" and "missing or invalid delimiters" might be too
> vague for troubleshooting this issue.
> 
> Also, I think the docs are indeed ambiguous.

Better?

trunk:
  Bug 6657: tighten header rule parsing syntax check, improve
  error message for a header rule like "exists:References:addr"
  Sending lib/Mail/SpamAssassin/Conf/Parser.pm
  Sending lib/Mail/SpamAssassin/Conf.pm
Committed revision 1166471.


> it is unclear if a blank header will match an exists: test.

Fixed documentation by the above commit. Now it says:

 header SYMBOLIC_TEST_NAME exists:header_field_name

  Define a header field existence test.  "header_field_name" is the
  name of a header field to test for existence.  Not to be confused
  with a test for an empty header field body, which can be
  implemented with a "header SYMBOLIC_TEST_NAME header_field_name op
  /pattern/modifiers" rule described above.


> Assuming 'exists:' does not match an empty (but still present) header

True, since 3.3.0.  See Bug 5965.
Comment 5 Mark Martinec 2011-09-08 00:13:26 UTC
Committed revision 1166476. Slight change of wording:

header SYMBOLIC_TEST_NAME exists:header_field_name

  Define a header field existence test.  "header_field_name" is the
  name of a header field to test for existence.  Not to be confused
  with a test for a nonempty header field body, which can be
  implemented by a "header SYMBOLIC_TEST_NAME header =~ /\S/" rule as
  described above.
Comment 6 Adam Katz 2011-09-10 00:31:25 UTC
I like the documentation changes, but I think "missing or invalid delimiters" was better than the new phrasing,

lint: config: SpamAssassin failed to parse line, "__HAS_REFERENCES_ADDR	exists:References:addr" does not specify a valid header field name for "header", skipping: header	__HAS_REFERENCES_ADDR	exists:References:addr at build/mkrules line 254.

$value and $line are nearly identical and therefore redundant.
Maybe something like:

  $parse_error = "config: SpamAssassin failed to parse line, \"$def\" ".
                  "does not specify a valid field name for \"$key\", ".
                  "skipping: $line";


Another note: even though this fails t/lint.t (but only on the first run since a 'make clean' or rules update), 'make test' does not fail and 'spamassassin --lint' still has no output (and -D doesn't add anything on this front).  Are these the desired outcomes?


Separately, and I know this was in the code before Mark touched it, I dislike our use of the regex character class   [!-9;-\176]   which matches the two ranges from "!" to "9" and from ";" to "\176" (~, tilde).  I had to look at a character map to even figure out that this is really just looking for any non-colon/non-space/non-extended printable characters (basically   [^:\s]   though I think other code has already handled the white space).  (An extra level of confusion: POSIX regexps that begin with an exclamation point are negated.)  Do we really need to look for extended or non-printing characters here?  Can we instead simplify this to   [^:]   or   [^:\s]   ?
Comment 7 Mark Martinec 2011-09-14 09:29:03 UTC
> lint: config: SpamAssassin failed to parse line, "__HAS_REFERENCES_ADDR   
> exists:References:addr" does not specify a valid header field name for
> "header", skipping: header    __HAS_REFERENCES_ADDR    exists:References:addr
> at build/mkrules line 254.

> $value and $line are nearly identical and therefore redundant.

Agreed, redundant.

> Maybe something like:
>   $parse_error = "config: SpamAssassin failed to parse line, \"$def\" ".
>                   "does not specify a valid field name for \"$key\", ".
>                   "skipping: $line";

The $def is unfortunately no longer available in Parser.pm where the
diagnostics is issued. So to avoid having to re-parse I just simplified
the text, it now reads:

$ spamassassin -L --lint
Sep 14 11:17:36.188 [94102] warn: config: SpamAssassin failed to parse line,
  it does not specify a valid header field name, skipping:
  header SYMBOLIC_TEST_NAME exists:From:addr

when encountering an illegal directive such as this.


> Another note: even though this fails t/lint.t (but only on the first run since
> a 'make clean' or rules update), 'make test' does not fail and 'spamassassin
> --lint' still has no output (and -D doesn't add anything on this front).  Are
> these the desired outcomes?

I don't understand what you are saying. The 'spamassassin --lint' does
produce output when there is an error in one of the configuration files.
Where is a t/lint.t ?


> Separately, and I know this was in the code before Mark touched it, I dislike
> our use of the regex character class   [!-9;-\176]   which matches the two
> ranges from "!" to "9" and from ";" to "\176" (~, tilde). I had to look at a
> character map to even figure out that this is really just looking for any
> non-colon/non-space/non-extended printable characters

The regexp was just transliterated from RFC 5322 section 3.6.8:
 
   field-name      =   1*ftext

   ftext           =   %d33-57 /          ; Printable US-ASCII
                       %d59-126           ;  characters not including
                                          ;  ":".

But I agree it is probably an overkill. Changed it to:

  } elsif ($header_name !~ /^([^: \t]+)$/) {  # be generous


trunk:
  Sending lib/Mail/SpamAssassin/Conf/Parser.pm
  Sending lib/Mail/SpamAssassin/Conf.pm
Committed revision 1170488.
Comment 8 Mark Martinec 2011-10-04 17:32:44 UTC
Can this be closed as fixed now?
Comment 9 Kevin A. McGrail 2012-01-17 18:25:25 UTC
Without additional comments, considering fixed with patches submitted.