Bug 5644 - Message sends BodyEval::check_stock_info into hard loop
Summary: Message sends BodyEval::check_stock_info into hard loop
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: 3.2.1
Hardware: PC Linux
: P3 major
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on: 5717
Blocks:
  Show dependency tree
 
Reported: 2007-09-10 15:47 UTC by Gary Windham
Modified: 2008-03-05 08:53 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
message that triggers the loop in _check_stock_info() text/plain None Gary Windham [NoCLA]
Avoid problem by applying regexp to one line at a time patch None Matthew Cline [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Gary Windham 2007-09-10 15:47:39 UTC
The attached message appears to be causing the _check_stock_info() routine in
Plugin/BodyEval.pm to go into a hard loop (consuming all available CPU until
killed).  The problem appears to be related to a massive amount of whitespace in
the body of the message; when this text is removed the problem disappears.

We are running SA 3.2.1 under Perl 5.8.8 on OpenSuSE 10.1
Comment 1 Gary Windham 2007-09-10 15:49:08 UTC
Created attachment 4120 [details]
message that triggers the loop in _check_stock_info()
Comment 2 Justin Mason 2007-09-11 02:44:17 UTC
the rule still gets good hits:

0.00000 	 0.5299  3543 of 668466 messages  	 0.0137  18 of 131397 messages  	
0.975 	 0.81 	 4.20 	TVD_STOCK1 	 	

http://ruleqa.spamassassin.org/?daterev=20070910-r574178-n&rule=%2FTVD_STOCK&srcpath=&g=Change

so we can't just delete it...
Comment 3 Matthew Cline 2007-11-08 11:36:17 UTC
Created attachment 4185 [details]
Avoid problem by applying regexp to one line at a time

Avoid pathological cases where the regexp takes a massive amount of time by
applying the regexp to only one line at a time.  This change prevents the rule
from catching instances where the whitespace in "keyword\s*:\s*value" contains
a newline; I don't know if this happens often enough to be a problem.
Comment 4 Mark Martinec 2007-11-12 16:44:49 UTC
Indeed it is quite terrible. The 3.3 produces the following timing report:

01:19:26.771 112.544 0.001 [65209] dbg: timing: total 111805 ms -
init: 6131 (5.5%), parse: 31 (0.0%), extract_message_metadata: 373 (0.3%),
get_uri_detail_list: 32 (0.0%), tests_pri_-1000: 55 (0.0%),
tests_pri_-950: 4 (0.0%), tests_pri_-900: 5 (0.0%),
tests_pri_-400: 1051 (0.9%), check_bayes: 814 (0.7%),
tests_pri_0: 103731 (92.8%), check_spf: 365 (0.3%), poll_dns_idle: 97 (0.1%),
check_dkim_signature: 309 (0.3%), check_razor2: 840 (0.8%),
check_pyzor: 0.09 (0.0%), check_dcc: 333 (0.3%), tests_pri_100: 6 (0.0%),
tests_pri_500: 291 (0.3%), tests_pri_1000: 83 (0.1%), total_awl: 77 (0.1%), 
check_awl: 21 (0.0%), check_awl_reput: 3 (0.0%), update_awl: 3 (0.0%)

A minute and 45 seconds of CPU-intensive grinding for priority-0 rules.

On a general note: I'm observing occasional similar degenarete cases
(as are also reported on a mailing list from time to time) ever since
the change was made from one-line-at-a-time rule application, to
per-paragraph rule application. Such cases are not frequent, but when
they hit, it is not unusual they cause a massive disruption in mail flow,
mostly because such mail comes in multiple similar instances at about
the same period. Admittedly it is often the mainstream SARE rules that
take the worst hit, but the problem is not exclusive to SARE rules.

When SpamAssassin takes more then a period a client is willing
to wait (depending on a setup), a timed-out mail may stay in a
MTA queue for a retry, aggreviating the situation.

The situation is quite unfortunate. If someone should want to cause
a DoS, it should not be too hard to target a couple of problematic
rules and devise a crafted message to purposely cause lengthy regexp
evaluation. I wonder if this is a good situation for a reputation of
a service that more and more folks depend upon to run mostly unattended.

Apart from reverting to per-line regexps (at the expense of accuracy),
I don't have a good solution. Perhaps limiting paragraphs in size,
maybe compressing spans of 3+ occurrences of same characters before
applying rules, ... ?

Comment 5 Justin Mason 2007-11-13 00:53:58 UTC
(In reply to comment #4)
> On a general note: I'm observing occasional similar degenarete cases
> (as are also reported on a mailing list from time to time) ever since
> the change was made from one-line-at-a-time rule application, to
> per-paragraph rule application. Such cases are not frequent, but when
> they hit, it is not unusual they cause a massive disruption in mail flow,
> mostly because such mail comes in multiple similar instances at about
> the same period. Admittedly it is often the mainstream SARE rules that
> take the worst hit, but the problem is not exclusive to SARE rules.
> 
> When SpamAssassin takes more then a period a client is willing
> to wait (depending on a setup), a timed-out mail may stay in a
> MTA queue for a retry, aggreviating the situation.
> 
> The situation is quite unfortunate. If someone should want to cause
> a DoS, it should not be too hard to target a couple of problematic
> rules and devise a crafted message to purposely cause lengthy regexp
> evaluation. I wonder if this is a good situation for a reputation of
> a service that more and more folks depend upon to run mostly unattended.
> 
> Apart from reverting to per-line regexps (at the expense of accuracy),
> I don't have a good solution. Perhaps limiting paragraphs in size,
> maybe compressing spans of 3+ occurrences of same characters before
> applying rules, ... ?

I think we should discuss reverting back to per-line regexps, as
I agree with your thoughts regarding reliability etc. 

Shall I open a bug?
Comment 6 Justin Mason 2007-11-13 02:21:08 UTC
ok, opened, bug 5717.
Comment 7 Justin Mason 2008-03-05 08:53:33 UTC
this is fixed in 3.3.0 due to the fix for bug 5717, which splits the 'rawbody'
representation into chunks of sizes between 1-2KB.