SA Bugzilla – Bug 5644
Message sends BodyEval::check_stock_info into hard loop
Last modified: 2008-03-05 08:53:33 UTC
The attached message appears to be causing the _check_stock_info() routine in Plugin/BodyEval.pm to go into a hard loop (consuming all available CPU until killed). The problem appears to be related to a massive amount of whitespace in the body of the message; when this text is removed the problem disappears. We are running SA 3.2.1 under Perl 5.8.8 on OpenSuSE 10.1
Created attachment 4120 [details] message that triggers the loop in _check_stock_info()
the rule still gets good hits: 0.00000 0.5299 3543 of 668466 messages 0.0137 18 of 131397 messages 0.975 0.81 4.20 TVD_STOCK1 http://ruleqa.spamassassin.org/?daterev=20070910-r574178-n&rule=%2FTVD_STOCK&srcpath=&g=Change so we can't just delete it...
Created attachment 4185 [details] Avoid problem by applying regexp to one line at a time Avoid pathological cases where the regexp takes a massive amount of time by applying the regexp to only one line at a time. This change prevents the rule from catching instances where the whitespace in "keyword\s*:\s*value" contains a newline; I don't know if this happens often enough to be a problem.
Indeed it is quite terrible. The 3.3 produces the following timing report: 01:19:26.771 112.544 0.001 [65209] dbg: timing: total 111805 ms - init: 6131 (5.5%), parse: 31 (0.0%), extract_message_metadata: 373 (0.3%), get_uri_detail_list: 32 (0.0%), tests_pri_-1000: 55 (0.0%), tests_pri_-950: 4 (0.0%), tests_pri_-900: 5 (0.0%), tests_pri_-400: 1051 (0.9%), check_bayes: 814 (0.7%), tests_pri_0: 103731 (92.8%), check_spf: 365 (0.3%), poll_dns_idle: 97 (0.1%), check_dkim_signature: 309 (0.3%), check_razor2: 840 (0.8%), check_pyzor: 0.09 (0.0%), check_dcc: 333 (0.3%), tests_pri_100: 6 (0.0%), tests_pri_500: 291 (0.3%), tests_pri_1000: 83 (0.1%), total_awl: 77 (0.1%), check_awl: 21 (0.0%), check_awl_reput: 3 (0.0%), update_awl: 3 (0.0%) A minute and 45 seconds of CPU-intensive grinding for priority-0 rules. On a general note: I'm observing occasional similar degenarete cases (as are also reported on a mailing list from time to time) ever since the change was made from one-line-at-a-time rule application, to per-paragraph rule application. Such cases are not frequent, but when they hit, it is not unusual they cause a massive disruption in mail flow, mostly because such mail comes in multiple similar instances at about the same period. Admittedly it is often the mainstream SARE rules that take the worst hit, but the problem is not exclusive to SARE rules. When SpamAssassin takes more then a period a client is willing to wait (depending on a setup), a timed-out mail may stay in a MTA queue for a retry, aggreviating the situation. The situation is quite unfortunate. If someone should want to cause a DoS, it should not be too hard to target a couple of problematic rules and devise a crafted message to purposely cause lengthy regexp evaluation. I wonder if this is a good situation for a reputation of a service that more and more folks depend upon to run mostly unattended. Apart from reverting to per-line regexps (at the expense of accuracy), I don't have a good solution. Perhaps limiting paragraphs in size, maybe compressing spans of 3+ occurrences of same characters before applying rules, ... ?
(In reply to comment #4) > On a general note: I'm observing occasional similar degenarete cases > (as are also reported on a mailing list from time to time) ever since > the change was made from one-line-at-a-time rule application, to > per-paragraph rule application. Such cases are not frequent, but when > they hit, it is not unusual they cause a massive disruption in mail flow, > mostly because such mail comes in multiple similar instances at about > the same period. Admittedly it is often the mainstream SARE rules that > take the worst hit, but the problem is not exclusive to SARE rules. > > When SpamAssassin takes more then a period a client is willing > to wait (depending on a setup), a timed-out mail may stay in a > MTA queue for a retry, aggreviating the situation. > > The situation is quite unfortunate. If someone should want to cause > a DoS, it should not be too hard to target a couple of problematic > rules and devise a crafted message to purposely cause lengthy regexp > evaluation. I wonder if this is a good situation for a reputation of > a service that more and more folks depend upon to run mostly unattended. > > Apart from reverting to per-line regexps (at the expense of accuracy), > I don't have a good solution. Perhaps limiting paragraphs in size, > maybe compressing spans of 3+ occurrences of same characters before > applying rules, ... ? I think we should discuss reverting back to per-line regexps, as I agree with your thoughts regarding reliability etc. Shall I open a bug?
ok, opened, bug 5717.
this is fixed in 3.3.0 due to the fix for bug 5717, which splits the 'rawbody' representation into chunks of sizes between 1-2KB.