SA Bugzilla – Bug 6271
revamp DATE_IN_FUTURE_96_XX rules
Last modified: 2015-04-12 15:22:36 UTC
+++ This bug was initially created as a clone of Bug #6269 +++ Promptly at the start of the new year, all mails started getting an extra 3.4 points based on FH_DATE_PAST_20XX: header FH_DATE_PAST_20XX Date =~ /20[1-9][0-9]/ [if-unset: 2006] describe FH_DATE_PAST_20XX The date is grossly in the future. It doesn't make much sense to have hardcoded dates like this in the rule-set. And from further comments: [reply] [-] Comment 3 Per Jessen 2010-01-01 05:22:59 UTC (In reply to comment #2) > There was no need to comment it out. It was already fixed 5 months ago. :-) > > http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/emailed/00_FVGT_File001.cf?r1=794319&r2=796216&diff_format=h Not really much of a "fix" - more like a work-around that'll come back and bite again in 10 years. "grossly in the future" is directly related to the current time, so shouldn't this rule take the current time into account? [reply] [-] Comment 4 Henrik Krohns 2010-01-01 05:27:17 UTC Right. But we have years of time to fix it. And it appears there is already a eval function for it: check_for_shifted_date
better, we should just discard the rule; as bug 5852 notes, DATE_IN_FUTURE_96_XX already takes care of this.
THIS NEEDS AN IMMEDIATE FIX FOR 3.2.5 *EVERY* email triggers it because the date is 2010. sa-update needs a push immediately. Just go with the svn version that bumps it to 2020+ and deal with the rest of it later.
(In reply to comment #2) > THIS NEEDS AN IMMEDIATE FIX FOR 3.2.5 > > *EVERY* email triggers it because the date is 2010. > > sa-update needs a push immediately. Just go with the svn version that bumps it > to 2020+ and deal with the rest of it later. Retracted, this is already taken care of. I hadn't waited for sa-update to complete in my checking before this post and I was too blindly trusting thunderbird3's new search feature, which hadn't properly found the recent related threads.
I've created rulesrc/sandbox/khopesh/bug_6271.cf in r899846 as an experiment for long-term replacement options for this rule, all inspired by bug 6269 comment 25 I fully expect this to include too many steps, most of which will never see hits. Likely we'll either conclude the single step (96h+) is enough or one or two steps beyond it is optimal. FH_DATE_PAST_20XX's high scores (1.536 2.699 2.390 2.564 as of r891460 on 20091216) certainly make it seem worthwhile to the GA...
http://ruleqa.spamassassin.org/week?srcpath=bug_6271 (Is there a better way to view results averaged over larger periods of time, or must I 'graph hit-rate over time' for each individual rule?) Here are yesterday's results: MSECS SPAM% HAM% S/O RANK SCORE NAME 0 0.0913 0 1.000 0.54 0.01 T_DATE_IN_FUTURE_96_WEEK 0 0.0493 0.0018 0.966 0.51 0.01 T_DATE_IN_FUTURE_WEEK 0 0.0376 0.0011 0.973 0.51 0.01 T_DATE_IN_FUTURE_MONTH 0 0.0143 0.0004 0.976 0.49 0.01 T_DATE_IN_FUTURE_YEAR 0 0.0081 0 1.000 0.49 0.01 T_DATE_IN_FUTURE_1Y_4Y 0 0.2445 0.0004 0.999 0.62 0.01 T_DATE_IN_DISTANT_FUTURE Despite the precedent of the other DATE_IN_ rules, it appears there is overlap between check_for_shifted_date(x,y) and check_for_shifted_date(y,z) for Y. I have incremented each starting value so as to eliminate this. (In reply to myself, from comment #4) > I fully expect this to include too many steps, most of which will never see > hits. Yup, looks like the month to 4-yr time-frame doesn't hit much spam and the ham dies down at around the 4-month mark (the end of _FUTURE_MONTH), so it's safe to fold everything past that together. I had been worried about people who have the right date and time but the wrong year, but as the days wouldn't line up, I guess that's not so much of a concern. My experience with FPs in these rules has mostly been related to time-zone issues, which certainly aren't a factor past the 96h threshold. Based on this, my current thinking is (with names ensuring asciibetical order): # Expected stats: 0.1782/0.0029 Spam%/Ham% at 0.984 S/O header DATE_IN_FUTURE_96_Q eval:check_for_shifted_date('96', '2920') describe DATE_IN_FUTURE_96_Q Date: is 4 days to 4 months after Received: date tflags DATE_IN_FUTURE_96_Q nopublish # remove 96_XX before publishing this! # Expected stats: 0.2469/0.0008 Spam%/Ham% at 0.997 S/O and a higher 'Rank' header DATE_IN_FUTURE_Q_PLUS eval:check_for_shifted_date('2920', 'undef') describe DATE_IN_FUTURE_Q_PLUS Date: is over 4 months after Received: date
Stats from comment #5 were roughly the same as predicted, with the break-down already justifying moving forward. I think this will have an overall good impact on _96_XX's FPs without impacting the bulk of its caught spam. Proposed steps: 1. Remove DATE_IN_FUTURE_96_XX from trunk. (FH_DATE_PAST_20XX is already removed) 2. Wait for/push the GA to score DATE_IN_FUTURE_96_Q and DATE_IN_FUTURE_Q_PLUS. 3. Replace the two removed rules with the two new rules on all branches. This will likely entail two voting periods for step 1 and then step 2 once the numbers are in.
Looking for two voting periods; one to move forward on my proposal from comment 6 and then if the resulting GA scores don't look so hot, a second one to determine if we want to push it further or just lapse back to DATE_IN_FUTURE_96_XX.
+1 on moving forward with the proposal in comment #6 deciding based on results what to do next.
(In reply to comment #8) > +1 on moving forward with the proposal in comment #6 deciding based on results > what to do next. +1 from me as well.
help wanted I'm not sure how to properly remove a rule. I can remove it from rulesrc/10_force_active.cf, but should I also modify rules/20_head_tests.cf and rules/50_scores.cf? Also, we'll need new translations of the description.
(In reply to comment #10) > help wanted > > I'm not sure how to properly remove a rule. I can remove it from > rulesrc/10_force_active.cf, but should I also modify rules/20_head_tests.cf and > rules/50_scores.cf? Also, we'll need new translations of the description. the safest thing is simply to replace with meta NAMEOFRULE (0) score NAMEOFRULE 0 if you're worried about causing problems. it'll never hit, so there's no need for a translation.
moving all open 3.3.1 bugs to 3.3.2
Moving back off of Security, which got changed by accident during the mass Target Milestone move.
Moving all open bugs where target is defined and 3.4.0 or lower to 3.4.1 target
This rule has been commented and is not being published.