Bug 5717 - revert 'rawbody' rule type to split paragraph blocks at 1024 chars, to avoid DOS problems
Summary: revert 'rawbody' rule type to split paragraph blocks at 1024 chars, to avoid ...
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: unspecified
Hardware: Other other
: P1 major
Target Milestone: 3.3.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 5041 5425 5486 5644 5795
  Show dependency tree
 
Reported: 2007-11-13 02:20 UTC by Justin Mason
Modified: 2008-03-19 05:05 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
equivalent patch for 3.2 patch None Mark Martinec [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Mason 2007-11-13 02:20:40 UTC
from http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5644 --
Mark said:

> On a general note: I'm observing occasional similar degenarete cases
> (as are also reported on a mailing list from time to time) ever since
> the change was made from one-line-at-a-time rule application, to
> per-paragraph rule application. Such cases are not frequent, but when
> they hit, it is not unusual they cause a massive disruption in mail flow,
> mostly because such mail comes in multiple similar instances at about
> the same period. Admittedly it is often the mainstream SARE rules that
> take the worst hit, but the problem is not exclusive to SARE rules.
>
> When SpamAssassin takes more then a period a client is willing
> to wait (depending on a setup), a timed-out mail may stay in a
> MTA queue for a retry, aggreviating the situation.
>
> The situation is quite unfortunate. If someone should want to cause
> a DoS, it should not be too hard to target a couple of problematic
> rules and devise a crafted message to purposely cause lengthy regexp
> evaluation. I wonder if this is a good situation for a reputation of
> a service that more and more folks depend upon to run mostly unattended.
>
> Apart from reverting to per-line regexps (at the expense of accuracy),
> I don't have a good solution. Perhaps limiting paragraphs in size,
> maybe compressing spans of 3+ occurrences of same characters before
> applying rules, ... ?

I agree this is an issue.  IMO we should simply revert back to per-line regexps;
I don't think we've gained enough accuracy to make the current situation
worthwhile...
Comment 1 Loren Wilton 2007-11-14 16:45:27 UTC
I have no objection to reverting raw rules to per-line regexps BY DEFAULT.  
After all, most existing rules were written for that regieme.

But if the rule is reverted to per-line, can we than have raw:all or raw:full 
or raw:morethanaline or some such to retain this capability for those few 
rules that would like to specifically run in this mode?
Comment 2 Justin Mason 2007-11-15 02:12:56 UTC
(In reply to comment #1)
> I have no objection to reverting raw rules to per-line regexps BY DEFAULT.  
> After all, most existing rules were written for that regieme.
> 
> But if the rule is reverted to per-line, can we than have raw:all or raw:full 
> or raw:morethanaline or some such to retain this capability for those few 
> rules that would like to specifically run in this mode?

Yeah; probably the best bet would be to add a new rule type.
Comment 3 Justin Mason 2008-02-04 07:51:49 UTC
upping priority.

committers: anyone disagree with this plan?
Comment 4 Mark Martinec 2008-02-21 09:44:51 UTC
Introduced a new subroutine Message::split_into_array_of_short_lines
to nicely split a text into array of paragraph chunks of sizes between
1 kB and 2 kB, returning the resulting array.

Called from get_decoded_body_text_array() and in
Plugin::BodyEval::_check_stock_info (bug 5644, similar
to proposed patch there).

The more degenerated a mail message text is, the more pronounced
a speedup is. Shown elapsed times in seconds for priority 0 tests,
as reported in the 'timing' debug line.

tests_pri_0:  46  ->  45 s  (sample message from bug 5041, s1) (no speedup)
tests_pri_0:  62  ->  49 s  (sample message from bug 5041, s2)
tests_pri_0: 486  ->  82 s  (sample message from bug 5041, s3)
tests_pri_0: 104  -> 10 s   (sample message from bug 5644)
tests_pri_0:  22  -> 11 s   (sample message from bug 5486)
tests_pri_0:   5.7 >  3.8 s (sample message from bug 5795, s1)
tests_pri_0:   5.8 >  2.6 s (sample message from bug 5795, s2)
tests_pri_0:  38  -> 25 s   (0.msg one of our today's messages)
tests_pri_0:  12  ->  8 s   (1.msg)
tests_pri_0:   8.2    4.3   (2.msg)
tests_pri_0:   8.0 >  7.6   (3.msg)
tests_pri_0:   8.8 >  5.8   (4.msg)


$ svn -m 'New sub Message::split_into_array_of_short_lines to nicely split
  a text into array of paragraph chunks of sizes between 1 kB and 2 kB;
  bugs: 5717, 5644, 5795, 5486, 5801, 5041' ci
Sending        SpamAssassin/Message/Node.pm
Sending        SpamAssassin/Message.pm
Sending        SpamAssassin/Plugin/BodyEval.pm
Committed revision 629888.
Comment 5 Mark Martinec 2008-02-21 09:46:45 UTC
> Introduced a new subroutine Message::split_into_array_of_short_lines
> $ svn -m 'New sub Message::split_into_array_of_short_lines to nicely...

typo, the name is actually:
  split_into_array_of_short_paragraphs
Comment 6 Justin Mason 2008-02-21 13:40:22 UTC
excellent!  You kept that one secret ;)
This fix is fine by me. +1

btw, I checked to see just how long a 1k paragraph is.  here's a sample, which
seems like quite a reasonable length:

'This is a demonstration of how long a paragraph size 1024 characters is; it
seems reasonably long if you ask me. Great! This is a demonstration of how long
a paragraph size 1024 characters is; it seems reasonably long if you ask me.
Great! This is a demonstration of how long a paragraph size 1024 characters is;
it seems reasonably long if you ask me. Great! This is a demonstration of how
long a paragraph size 1024 characters is; it seems reasonably long if you ask
me. Great! This is a demonstration of how long a paragraph size 1024 characters
is; it seems reasonably long if you ask me. Great! This is a demonstration of
how long a paragraph size 1024 characters is; it seems reasonably long if you
ask me. Great! This is a demonstration of how long a paragraph size 1024
characters is; it seems reasonably long if you ask me. Great! This is a
demonstration of how long a paragraph size 1024 characters is; it seems
reasonably long if you ask me. Great! This is a demonstration of how long a
paragraph size 1024 characters'
Comment 7 Justin Mason 2008-02-25 07:14:24 UTC
btw, I suggest we either close this bug as FIXED, now that it's applied to
3.3.0, or retarget at 3.2.5 if we agree that this should be applied to that branch.

(Personally I favour the former option.  I think the change is too major for
3.2.x maintainance.)

once this is closed, we can close bugs 5041, 5486, 5644 and 5795, right?
Comment 8 Mark Martinec 2008-02-25 08:28:53 UTC
> btw, I suggest we either close this bug as FIXED, now that it's applied to
> 3.3.0, or retarget at 3.2.5 if we agree that this should be applied to that
> branch.

Fine with me, one way or the other.

> once this is closed, we can close bugs 5041, 5486, 5644 and 5795, right?

5041 not resolved
5486 ok, can be closed
5644 ok, can be closed
5795 ok, can be closed

The bug 5041 situation is even more likely now, as the current default
SA size limit (500 KB according to spamc man page) is now above the
default chunk size to which Outlook chops its message/partial chunks.
Comment 9 Justin Mason 2008-02-25 08:44:39 UTC
(In reply to comment #8)
> The bug 5041 situation is even more likely now, as the current default
> SA size limit (500 KB according to spamc man page) is now above the
> default chunk size to which Outlook chops its message/partial chunks.

ok, as I suggest there, we should at least kill off the easily-fixed part of
that -- not scanning message/partial.  not sure what to do with the rest though. :(
Comment 10 Daryl C. W. O'Shea 2008-02-25 10:46:14 UTC
(In reply to comment #7)
> (Personally I favour the former option.  I think the change is too major for
> 3.2.x maintainance.)

+1
Comment 11 Mark Martinec 2008-02-28 06:50:57 UTC
Created attachment 4267 [details]
equivalent patch for 3.2

Just in case someone wants this fix for 3.2, here is
the patch.
Comment 12 Justin Mason 2008-03-05 08:50:24 UTC
ok, let's close it.  thanks Mark!(In reply to comment #8)
> > btw, I suggest we either close this bug as FIXED, now that it's applied to
> > 3.3.0, or retarget at 3.2.5 if we agree that this should be applied to that
> > branch.
> 
> Fine with me, one way or the other.

ok, closing! thanks Mark...
Comment 13 Mark Martinec 2008-03-11 04:18:31 UTC
> > > btw, I suggest we either close this bug as FIXED, now that it's applied to
> > > 3.3.0, or retarget at 3.2.5 if we agree that this should be applied to that
> > > branch.
> ok, closing!

This entry is targeted at 3.2.5, yet it is closed and a change is only in 3.3.
Wrong target? 

Comment 14 Justin Mason 2008-03-11 04:57:10 UTC
(In reply to comment #13)
> > > > btw, I suggest we either close this bug as FIXED, now that it's applied to
> > > > 3.3.0, or retarget at 3.2.5 if we agree that this should be applied to that branch.
> > ok, closing!
> 
> This entry is targeted at 3.2.5, yet it is closed and a change is only in 3.3.
> Wrong target? 

yep; I retargeted prematurely and then forgot to undo that.