SA Bugzilla – Bug 5717
revert 'rawbody' rule type to split paragraph blocks at 1024 chars, to avoid DOS problems
Last modified: 2008-03-19 05:05:18 UTC
from http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5644 -- Mark said: > On a general note: I'm observing occasional similar degenarete cases > (as are also reported on a mailing list from time to time) ever since > the change was made from one-line-at-a-time rule application, to > per-paragraph rule application. Such cases are not frequent, but when > they hit, it is not unusual they cause a massive disruption in mail flow, > mostly because such mail comes in multiple similar instances at about > the same period. Admittedly it is often the mainstream SARE rules that > take the worst hit, but the problem is not exclusive to SARE rules. > > When SpamAssassin takes more then a period a client is willing > to wait (depending on a setup), a timed-out mail may stay in a > MTA queue for a retry, aggreviating the situation. > > The situation is quite unfortunate. If someone should want to cause > a DoS, it should not be too hard to target a couple of problematic > rules and devise a crafted message to purposely cause lengthy regexp > evaluation. I wonder if this is a good situation for a reputation of > a service that more and more folks depend upon to run mostly unattended. > > Apart from reverting to per-line regexps (at the expense of accuracy), > I don't have a good solution. Perhaps limiting paragraphs in size, > maybe compressing spans of 3+ occurrences of same characters before > applying rules, ... ? I agree this is an issue. IMO we should simply revert back to per-line regexps; I don't think we've gained enough accuracy to make the current situation worthwhile...
I have no objection to reverting raw rules to per-line regexps BY DEFAULT. After all, most existing rules were written for that regieme. But if the rule is reverted to per-line, can we than have raw:all or raw:full or raw:morethanaline or some such to retain this capability for those few rules that would like to specifically run in this mode?
(In reply to comment #1) > I have no objection to reverting raw rules to per-line regexps BY DEFAULT. > After all, most existing rules were written for that regieme. > > But if the rule is reverted to per-line, can we than have raw:all or raw:full > or raw:morethanaline or some such to retain this capability for those few > rules that would like to specifically run in this mode? Yeah; probably the best bet would be to add a new rule type.
upping priority. committers: anyone disagree with this plan?
Introduced a new subroutine Message::split_into_array_of_short_lines to nicely split a text into array of paragraph chunks of sizes between 1 kB and 2 kB, returning the resulting array. Called from get_decoded_body_text_array() and in Plugin::BodyEval::_check_stock_info (bug 5644, similar to proposed patch there). The more degenerated a mail message text is, the more pronounced a speedup is. Shown elapsed times in seconds for priority 0 tests, as reported in the 'timing' debug line. tests_pri_0: 46 -> 45 s (sample message from bug 5041, s1) (no speedup) tests_pri_0: 62 -> 49 s (sample message from bug 5041, s2) tests_pri_0: 486 -> 82 s (sample message from bug 5041, s3) tests_pri_0: 104 -> 10 s (sample message from bug 5644) tests_pri_0: 22 -> 11 s (sample message from bug 5486) tests_pri_0: 5.7 > 3.8 s (sample message from bug 5795, s1) tests_pri_0: 5.8 > 2.6 s (sample message from bug 5795, s2) tests_pri_0: 38 -> 25 s (0.msg one of our today's messages) tests_pri_0: 12 -> 8 s (1.msg) tests_pri_0: 8.2 4.3 (2.msg) tests_pri_0: 8.0 > 7.6 (3.msg) tests_pri_0: 8.8 > 5.8 (4.msg) $ svn -m 'New sub Message::split_into_array_of_short_lines to nicely split a text into array of paragraph chunks of sizes between 1 kB and 2 kB; bugs: 5717, 5644, 5795, 5486, 5801, 5041' ci Sending SpamAssassin/Message/Node.pm Sending SpamAssassin/Message.pm Sending SpamAssassin/Plugin/BodyEval.pm Committed revision 629888.
> Introduced a new subroutine Message::split_into_array_of_short_lines > $ svn -m 'New sub Message::split_into_array_of_short_lines to nicely... typo, the name is actually: split_into_array_of_short_paragraphs
excellent! You kept that one secret ;) This fix is fine by me. +1 btw, I checked to see just how long a 1k paragraph is. here's a sample, which seems like quite a reasonable length: 'This is a demonstration of how long a paragraph size 1024 characters is; it seems reasonably long if you ask me. Great! This is a demonstration of how long a paragraph size 1024 characters is; it seems reasonably long if you ask me. Great! This is a demonstration of how long a paragraph size 1024 characters is; it seems reasonably long if you ask me. Great! This is a demonstration of how long a paragraph size 1024 characters is; it seems reasonably long if you ask me. Great! This is a demonstration of how long a paragraph size 1024 characters is; it seems reasonably long if you ask me. Great! This is a demonstration of how long a paragraph size 1024 characters is; it seems reasonably long if you ask me. Great! This is a demonstration of how long a paragraph size 1024 characters is; it seems reasonably long if you ask me. Great! This is a demonstration of how long a paragraph size 1024 characters is; it seems reasonably long if you ask me. Great! This is a demonstration of how long a paragraph size 1024 characters'
btw, I suggest we either close this bug as FIXED, now that it's applied to 3.3.0, or retarget at 3.2.5 if we agree that this should be applied to that branch. (Personally I favour the former option. I think the change is too major for 3.2.x maintainance.) once this is closed, we can close bugs 5041, 5486, 5644 and 5795, right?
> btw, I suggest we either close this bug as FIXED, now that it's applied to > 3.3.0, or retarget at 3.2.5 if we agree that this should be applied to that > branch. Fine with me, one way or the other. > once this is closed, we can close bugs 5041, 5486, 5644 and 5795, right? 5041 not resolved 5486 ok, can be closed 5644 ok, can be closed 5795 ok, can be closed The bug 5041 situation is even more likely now, as the current default SA size limit (500 KB according to spamc man page) is now above the default chunk size to which Outlook chops its message/partial chunks.
(In reply to comment #8) > The bug 5041 situation is even more likely now, as the current default > SA size limit (500 KB according to spamc man page) is now above the > default chunk size to which Outlook chops its message/partial chunks. ok, as I suggest there, we should at least kill off the easily-fixed part of that -- not scanning message/partial. not sure what to do with the rest though. :(
(In reply to comment #7) > (Personally I favour the former option. I think the change is too major for > 3.2.x maintainance.) +1
Created attachment 4267 [details] equivalent patch for 3.2 Just in case someone wants this fix for 3.2, here is the patch.
ok, let's close it. thanks Mark!(In reply to comment #8) > > btw, I suggest we either close this bug as FIXED, now that it's applied to > > 3.3.0, or retarget at 3.2.5 if we agree that this should be applied to that > > branch. > > Fine with me, one way or the other. ok, closing! thanks Mark...
> > > btw, I suggest we either close this bug as FIXED, now that it's applied to > > > 3.3.0, or retarget at 3.2.5 if we agree that this should be applied to that > > > branch. > ok, closing! This entry is targeted at 3.2.5, yet it is closed and a change is only in 3.3. Wrong target?
(In reply to comment #13) > > > > btw, I suggest we either close this bug as FIXED, now that it's applied to > > > > 3.3.0, or retarget at 3.2.5 if we agree that this should be applied to that branch. > > ok, closing! > > This entry is targeted at 3.2.5, yet it is closed and a change is only in 3.3. > Wrong target? yep; I retargeted prematurely and then forgot to undo that.