Bug 6271 - revamp DATE_IN_FUTURE_96_XX rules
revamp DATE_IN_FUTURE_96_XX rules
Status: NEW
Product: Spamassassin
Classification: Unclassified
Component: Rules
3.2.5
PC Linux
: P3 minor
: 3.4.1
Assigned To: SpamAssassin Developer Mailing List
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2010-01-01 13:15 UTC by Jonathan
Modified: 2013-06-21 16:09 UTC (History)
13 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan 2010-01-01 13:15:52 UTC
+++ This bug was initially created as a clone of Bug #6269 +++

Promptly at the start of the new year, all mails started getting an extra 3.4 points based on FH_DATE_PAST_20XX:

header   FH_DATE_PAST_20XX      Date =~ /20[1-9][0-9]/ [if-unset: 2006]
describe FH_DATE_PAST_20XX      The date is grossly in the future.

It doesn't make much sense to have hardcoded dates like this in the rule-set.

And from further comments:

[reply] [-] Comment 3 Per Jessen      2010-01-01 05:22:59 UTC

(In reply to comment #2)
> There was no need to comment it out. It was already fixed 5 months ago. :-)
> 
> http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/emailed/00_FVGT_File001.cf?r1=794319&r2=796216&diff_format=h

Not really much of a "fix" - more like a work-around that'll come back and bite
again in  10 years. "grossly in the future" is directly related to the current
time, so shouldn't this rule take the current time into account?

[reply] [-] Comment 4 Henrik Krohns 2010-01-01 05:27:17 UTC

Right. But we have years of time to fix it.

And it appears there is already a eval function for it: check_for_shifted_date
Comment 1 Justin Mason 2010-01-01 14:00:45 UTC
better, we should just discard the rule; as bug 5852 notes, DATE_IN_FUTURE_96_XX already takes care of this.
Comment 2 Adam Katz 2010-01-01 18:32:52 UTC
THIS NEEDS AN IMMEDIATE FIX FOR 3.2.5

*EVERY* email triggers it because the date is 2010.

sa-update needs a push immediately.  Just go with the svn version that bumps it to 2020+ and deal with the rest of it later.
Comment 3 Adam Katz 2010-01-01 18:52:57 UTC
(In reply to comment #2)
> THIS NEEDS AN IMMEDIATE FIX FOR 3.2.5
> 
> *EVERY* email triggers it because the date is 2010.
> 
> sa-update needs a push immediately.  Just go with the svn version that bumps it
> to 2020+ and deal with the rest of it later.

Retracted, this is already taken care of.

I hadn't waited for sa-update to complete in my checking before this post and I was too blindly trusting thunderbird3's new search feature, which hadn't properly found the recent related threads.
Comment 4 Adam Katz 2010-01-15 16:16:26 UTC
I've created rulesrc/sandbox/khopesh/bug_6271.cf in r899846 as an experiment for long-term replacement options for this rule, all inspired by bug 6269 comment 25

I fully expect this to include too many steps, most of which will never see hits.  Likely we'll either conclude the single step (96h+) is enough or one or two steps beyond it is optimal.  FH_DATE_PAST_20XX's high scores (1.536 2.699 2.390 2.564 as of r891460 on 20091216) certainly make it seem worthwhile to the GA...
Comment 5 Adam Katz 2010-01-25 15:49:54 UTC
http://ruleqa.spamassassin.org/week?srcpath=bug_6271
(Is there a better way to view results averaged over larger periods of time, or must I 'graph hit-rate over time' for each individual rule?)
Here are yesterday's results:

MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME
    0   0.0913        0   1.000    0.54    0.01  T_DATE_IN_FUTURE_96_WEEK
    0   0.0493   0.0018   0.966    0.51    0.01  T_DATE_IN_FUTURE_WEEK
    0   0.0376   0.0011   0.973    0.51    0.01  T_DATE_IN_FUTURE_MONTH
    0   0.0143   0.0004   0.976    0.49    0.01  T_DATE_IN_FUTURE_YEAR
    0   0.0081        0   1.000    0.49    0.01  T_DATE_IN_FUTURE_1Y_4Y
    0   0.2445   0.0004   0.999    0.62    0.01  T_DATE_IN_DISTANT_FUTURE

Despite the precedent of the other DATE_IN_ rules, it appears there is overlap between check_for_shifted_date(x,y) and check_for_shifted_date(y,z) for Y.  I have incremented each starting value so as to eliminate this.

(In reply to myself, from comment #4)
> I fully expect this to include too many steps, most of which will never see
> hits.

Yup, looks like the month to 4-yr time-frame doesn't hit much spam and the ham dies down at around the 4-month mark (the end of _FUTURE_MONTH), so it's safe to fold everything past that together.

I had been worried about people who have the right date and time but the wrong year, but as the days wouldn't line up, I guess that's not so much of a concern.  My experience with FPs in these rules has mostly been related to time-zone issues, which certainly aren't a factor past the 96h threshold.

Based on this, my current thinking is (with names ensuring asciibetical order):

# Expected stats: 0.1782/0.0029 Spam%/Ham% at 0.984 S/O
header   DATE_IN_FUTURE_96_Q eval:check_for_shifted_date('96', '2920')
describe DATE_IN_FUTURE_96_Q Date: is 4 days to 4 months after Received: date
tflags   DATE_IN_FUTURE_96_Q nopublish # remove 96_XX before publishing this!

# Expected stats: 0.2469/0.0008 Spam%/Ham% at 0.997 S/O and a higher 'Rank'
header   DATE_IN_FUTURE_Q_PLUS eval:check_for_shifted_date('2920', 'undef')
describe DATE_IN_FUTURE_Q_PLUS Date: is over 4 months after Received: date
Comment 6 Adam Katz 2010-01-29 12:16:51 UTC
Stats from comment #5 were roughly the same as predicted, with the break-down already justifying moving forward.  I think this will have an overall good impact on _96_XX's FPs without impacting the bulk of its caught spam.


Proposed steps:
1. Remove DATE_IN_FUTURE_96_XX from trunk. (FH_DATE_PAST_20XX is already removed)
2. Wait for/push the GA to score DATE_IN_FUTURE_96_Q and DATE_IN_FUTURE_Q_PLUS.
3. Replace the two removed rules with the two new rules on all branches.

This will likely entail two voting periods for step 1 and then step 2 once the numbers are in.
Comment 7 Adam Katz 2010-02-05 15:07:40 UTC
Looking for two voting periods; one to move forward on my proposal from
comment 6 and then if the resulting GA scores don't look so hot, a second one to determine if we want to push it further or just lapse back to DATE_IN_FUTURE_96_XX.
Comment 8 Sidney Markowitz 2010-02-05 15:35:40 UTC
+1 on moving forward with the proposal in comment #6 deciding based on results what to do next.
Comment 9 Jonathan 2010-02-06 01:28:25 UTC
(In reply to comment #8)
> +1 on moving forward with the proposal in comment #6 deciding based on results
> what to do next.

+1 from me as well.
Comment 10 Adam Katz 2010-02-08 17:41:38 UTC
help wanted

I'm not sure how to properly remove a rule.  I can remove it from rulesrc/10_force_active.cf, but should I also modify rules/20_head_tests.cf and rules/50_scores.cf?  Also, we'll need new translations of the description.
Comment 11 Justin Mason 2010-02-12 17:31:46 UTC
(In reply to comment #10)
> help wanted
> 
> I'm not sure how to properly remove a rule.  I can remove it from
> rulesrc/10_force_active.cf, but should I also modify rules/20_head_tests.cf and
> rules/50_scores.cf?  Also, we'll need new translations of the description.

the safest thing is simply to replace with

    meta NAMEOFRULE (0)
    score NAMEOFRULE 0

if you're worried about causing problems.   it'll never hit, so there's
no need for a translation.
Comment 12 Justin Mason 2010-03-23 16:33:30 UTC
moving all open 3.3.1 bugs to 3.3.2
Comment 13 Karsten Bräckelmann 2010-03-23 17:42:40 UTC
Moving back off of Security, which got changed by accident during the mass Target Milestone move.
Comment 14 Kevin A. McGrail 2013-06-21 16:09:40 UTC
Moving all open bugs where target is defined and 3.4.0 or lower to 3.4.1 target