Bug 6558 - [review] body rules having "tflags multiple" may cause infinite loop when compiled
Summary: [review] body rules having "tflags multiple" may cause infinite loop when com...
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: sa-compile (show other bugs)
Version: 3.3.1
Hardware: PC Linux
: P1 major
Target Milestone: 3.3.2
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard: rtc
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-20 11:56 UTC by John Hardin
Modified: 2011-05-05 23:46 UTC (History)
5 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Short artificial email to trigger this bug text/plain None Florian Hinzmann [NoCLA]
Debug log entering the endless loop, spamassassin -D text/plain None Florian Hinzmann [NoCLA]
Patch that fixes loop issue. patch None melson+sa@fastmail.net [NoCLA]
Avoids [perl #86784] bug patch None Mark Martinec [HasCLA]
let sa-update issue a warning when stumbling across a rule which could hit bug 6558 patch None Mark Martinec [HasCLA]
A combo patch of what will go into the 3.3.2 patch None Mark Martinec [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description John Hardin 2011-03-20 11:56:51 UTC
__PILL_PRICE subrules with "tflags multiple" cause infinite loop on matching message text when compiled on ia32. Removing the "tflags multiple" corrects the problem. No issues when not compiled.

http://mail-archives.apache.org/mod_mbox/spamassassin-users/201103.mbox/ajax/%3C4D85D121.5020704@fastmail.net%3E

I'm disabling those rules until what I assume is is a bug in sa-compile or re2c is fixed.
Comment 2 Karsten Bräckelmann 2011-03-20 16:33:43 UTC
Raising Priority.

I guess this warrants a manually triggered rules update release, with the affected rules' tflags removed. Just minor surgery to the current tarball.

How again can we do this? Time to dig through my archive...
Comment 3 John Hardin 2011-03-20 20:19:39 UTC
To spare people wading through the mail archive, if these rules are causing you problems disable them by putting this in your local site config:

meta __PILL_PRICE_1  (0)
meta __PILL_PRICE_2  (0)
meta __PILL_PRICE_3  (0)

Thanks, Karsten!
Comment 4 John Hardin 2011-03-20 20:21:56 UTC
Problem does also seem to affect x86_64
Comment 5 Karsten Bräckelmann 2011-03-20 20:31:12 UTC
Heh, awesome -- I was just about to add the same note here, after reading confirmation by a user, that my untested fix of disabling the sub-rules without breaking dependencies worked.

The fix of commenting out tflags multiple for the affected rules (comment 0) requires messing with the sa-update channel. The fix in comment 3 does not, though it will require being reverted at a later point.

As another note, IIRC there are now two reports on the users list, that this does not only affect 32 bit architecture.
Comment 6 Karsten Bräckelmann 2011-03-20 20:39:05 UTC
Also, there are reports that one system was affected by this issue, though an almost identical system (same OS, same packages, both Xeons, different speeds) was not.

That might hint, that this is a really obscure bug with re2c, triggered in some very specific cases only.
Comment 7 Florian Hinzmann 2011-03-22 04:27:10 UTC
Just to confirm this: It does affect 64bit, too.

celduin:~# uname -a
Linux celduin 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
celduin:~# spamassassin --version
SpamAssassin version 3.3.1
  running on Perl version 5.10.1
celduin:~# re2c --version
re2c 0.13.5

I will attach a spamassassin debug log and a very short mail I used
to trigger this bug.
Comment 8 Florian Hinzmann 2011-03-22 04:28:48 UTC
Created attachment 4853 [details]
Short artificial email to trigger this bug
Comment 9 Florian Hinzmann 2011-03-22 04:29:45 UTC
Created attachment 4854 [details]
Debug log entering the endless loop, spamassassin -D
Comment 10 melson+sa 2011-03-22 09:45:59 UTC
Created attachment 4855 [details]
Patch that fixes loop issue.

Meant more to aid in debugging than an actual fix.
Comment 11 melson+sa 2011-03-22 09:50:01 UTC
Oop sorry for that - meant to leave an actual useful comment (not too familiar with bugzilla). 

Don't want to clutter the bugzilla up with too much noise, but that patch seems to fix the loop issue for me in my environment, though I'm sure there are still outstanding questions.  I sent a full(er) analysis of the issue to the dev list as well.
Comment 12 Mark Martinec 2011-03-22 16:13:00 UTC
Matt Elson wrote on the mailing list a valuable description of the
program flow - valuable for posteriority, so I'm including it here:

Hey all,

I've been doing some investigation of Bug 6558 (__PILL_PRICE_[1-3] + 
Compiled Rulesets == endless loop_ on my end and want to share the 
results - I'm not super familiar with SpamAssassin's code base, so 
apologies if I misread anything or am totally off.

Long story short since this email has gotten a little bit lengthy, I 
think the problem lies in a function being created around line 97 in 
OneLineBodyRuleType that Rule2XSBody later uses in some cases.

Lengthy analysis below:

At this point, I have a test machine (x64) where I have removed *all 
rules* except the rules I'm testing and disabled every plugin except 
Rule2XSBody.pm and Check.pm.

First, I've played around with the regexes and found that something as 
simple as:

body        LOCAL_TEST         pill
tflags      LOCAL_TEST         multiple

will cause the problem (when run on the short artificial email attached 
to the bugzilla).

Interestingly enough, if I make this case insensitive

body        LOCAL_TEST         /pill/i
tflags      LOCAL_TEST         multiple

The problem goes away.

So at that point I started poking around the code for Rule2XSBody 
because I was curious... and this is where I'm probably a bit out of my 
depth.  But, it looks like the reason the case insensitive rule does 
not hit the problem is because the results of the CompiledRegexps scan 
is flagged as "non lossy" (l=0) and gets hits by the if statement around 
line 243 in Rule2XSBody.pm.  Case sensitive rules are flagged as lossy 
(l=1) by the CompiledRegexps and have to move on.  They get up to the 
stanza at line 261 - if (!&{$fn} ($scanner, $line) && $do_dbg).. and 
this is where things are getting stuck for me.  This is where it got 
interesting - when I added in my debugging and ran through the original 
__PILL_PRICE_[1-3] rules that created it - they're all flagged as lossy.

$fn seems to be a dynamically created function that Rule2XSBody (by way 
of OneLineBodyRuleType.pm) creates - unfortunately I can't quite 
decipher the code  - line 142 in OneLineBodyRuleType.pm is where it's 
made.  While I can't make out what the function's supposed to do, it is 
worth noting that when the rule it's being created for has a tflag of 
"multiple", the function has a while condition: i.e.

while ($_[1] =~ '.$pat.'g) {

Whereas if the tflag is NOT multiple, it's just an if condition

if ($_[1] =~ '.$pat.') {

I'm not quite sure what's supposed to break out of the while loop, but 
I'm fairly sure it's not getting correctly broken and is where 
everything's getting stuck.  I changed the "while" to an if just to test 
this theory and once I do this.. the problem goes away for me, 
completely on all regexes, both my simple pill and the more elaborate 
original ones (and rewrites).  I'd imagine not a real solution, but good 
for testing. (simple patch attached in case I was unclear about the
change).

This doesn't quite explain why the problem doesn't emerge for everyone 
using compiled rules (though maybe the difference is whether or not the 
CompiledRegexpsModule is flagging the rules as lossy; that might differ 
from architecture to architecture and environment to environment and 
when the rules are NOT lossy, they don't get to the bit of code that 
seems to be causing the problem).

For further information, here's what the dynamic function function looks 
like when I spit it out with some debugging.

  sub JUST_PILLS_one_line_body_test { {
       pos $_[1] = 0;

#line 1 "/var/lib/spamassassin/3.003001/local.cf, rule JUST_PILLS,"
       while ($_[1] =~ /pill/g) {
         my $self = $_[0];
         $self->got_hit(q{JUST_PILLS}, "BODY: ", ruletype => 
"one_line_body");

         dbg("rules: ran one_line_body rule JUST_PILLS ======> got hit: 
\"" . ($&|| "negative match") . "\"");

       }
       } }


(notice that that's the debug statement that you see repeated over and 
over; the comments before ${fn} is called suggest that this is running 
the real regex).

Like I said, I'm having trouble making sense of it ($_ was never a 
friend of mine) and for the life of me I don't know how the loop is 
supposed to end.

Another little hack I did that seems to fix it (though goodness knows at 
what cost) is to add an s at front

(i.e. making it while $_[1] =~ s/pill/g).

Again, not suggesting that as a real solution since modifying variables 
arbitrarily seems.. unwise, but maybe it will help troubleshoot/debug 
further.

Anyway, hope this helps out!

Matt


[...] would something like this
help with any performance degradation caused by my initial patch:

my $line = \$_[1];
while ($$line =~ '.$pat.'g) {

Or does it amount to more or less the same thing as my original patch? 
I'm not completely clear in how perl handles references and didn't
really do anything more than a cursory test on it (which worked).

Matt
Comment 13 Mark Martinec 2011-03-22 16:26:34 UTC
Created attachment 4856 [details]
Avoids [perl #86784] bug

Seems to be a perl bug, affecting 5.8, 5.10, 5.12, but not 5.13.10.
I have reported the perl bug as [perl #86784]:
  http://rt.perl.org/rt3/Ticket/Display.html?id=86784

For a workaround I'm attaching the patch, which is essentially
Matt Elson's last suggested patch. It avoids copying the
string (Justin Mason's concern), while also working around
the perl bug. As noted above the perl 5.14 will no longer
exhibit this bug.

trunk:
  fix to Bug 6558: __PILL_PRICE rules (having "tflags multiple")
  cause infinite loop when compiled
Sending lib/Mail/SpamAssassin/Plugin/OneLineBodyRuleType.pm
Committed revision 1084340.
Comment 14 Mark Martinec 2011-03-23 09:03:25 UTC
> Seems to be a perl bug, affecting 5.8, 5.10, 5.12, but not 5.13.10.
> I have reported the perl bug as [perl #86784]:
>   http://rt.perl.org/rt3/Ticket/Display.html?id=86784

I didn't realize this perl bug was reported before - so the #86784
is a duplicate of:
* #8262: //g loops infinitely on tainted data
* #27344: pos() does not get updated when running in taint mode
* #5475: Bug in taint+regex+hash/arrays

The #8262 brings the fix:

  Now fixed by commit fd69380d5d5b95ef16e2521cf4251b34ee0ce151
  in branch davem/post-5.12, which should be merged back into blead
  once 5.12 has been released, and thus appear in 5.13 onwards.

So yes, perl 5.13 is no longer affected.
According to explanations there, using an array element or a hash
element with magic as an operand of /.../g triggers the issue
(the pos gets assigned to a temporary copy and is then lost).
Both proposed solutions by Matt avoid the bug. The magic may be
a taint, but can be other, like substr magic. In this regard we
now know that we have avoided the bug with the proposed patch
or a similar change.
Comment 15 Mark Martinec 2011-03-23 16:33:38 UTC
Created attachment 4859 [details]
let sa-update issue a warning when stumbling across a rule which could hit bug 6558

Guenther wrote on the ML:

> Can we come up with some (brief) instructions, and a test rule that
> should trigger the bug? So we can have those previously effected verify
> it, and more importantly, verify the (tiny) patch to fix the issue for
> them? A test case to be sure, whether or not future rule updates even
> would include what's just broken the compile?

> Also, while this Perl bug is out there (which undoubtful will exists for
> a long time on production machines), should we refrain from tflags
> multiple body rules?

I guess so.

> Though yeah, agreed, a lint check at least for Jenkins to catch
> inappropriate tflags multiple for body rules and version <3.3.2 sounds
> like a good precaution. Might back-fire, though, due to generated
> scores. In the worst case, this is scorched earth until 3.4. :(


Attached is a patch to BodyRuleBaseExtractor.pm which will
make sa-compile report a warning when hitting a rule with
tflag=multiple and a lossy match. Appropriate for Jenkins runs.

E.g. (hand wrapped and cleansed):

# sa-compile                 
Mar 23 21:19:07.288 [9363] info: generic: base extraction starting....
Mar 23 21:19:07.288 [9363] info: generic: extracting from rules of ...
zoom: rule __PILL_PRICE_3 will loop on SpamAssassin older that 3.3.2
  running under Perl 5.12 or older, Bug 6558
zoom: rule __PILL_PRICE_1 will loop on SpamAssassin older that 3.3.2
  running under Perl 5.12 or older, Bug 6558
zoom: rule __PILL_PRICE_2 will loop on SpamAssassin older that 3.3.2
  running under Perl 5.12 or older, Bug 6558


trunk:
  Bug 6558: let sa-update issue a warning when stumbling across a rule
  which could hit this bug
Sending lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Committed revision 1084721.
Comment 16 Mark Martinec 2011-03-23 18:38:55 UTC
trunk:
  add a dummy function feature_bug6558_free() to allow
  conditionalizing problematic rules;
  a "feature" function must return a true to be considered'
Sending lib/Mail/SpamAssassin/Conf/Parser.pm
Sending lib/Mail/SpamAssassin/Conf.pm
Committed revision 1084790.

So something like this can be used to safely have problematic
rules in the ruleset (regardless of SA version):

if can(Mail::SpamAssassin::Conf::feature_bug6558_free)
meta  MANY_PILL_PRICE  ...
endif
Comment 17 Kevin A. McGrail 2011-03-24 11:03:56 UTC
(In reply to comment #16)
> trunk:
>   add a dummy function feature_bug6558_free() to allow
>   conditionalizing problematic rules;
>   a "feature" function must return a true to be considered'
> Sending lib/Mail/SpamAssassin/Conf/Parser.pm
> Sending lib/Mail/SpamAssassin/Conf.pm
> Committed revision 1084790.
> 
> So something like this can be used to safely have problematic
> rules in the ruleset (regardless of SA version):
> 
> if can(Mail::SpamAssassin::Conf::feature_bug6558_free)
> meta  MANY_PILL_PRICE  ...
> endif

I thought this was an elegantly simple solution.  +1 KAM
Comment 18 John Hardin 2011-03-24 11:22:08 UTC
(In reply to comment #17)
>
> I thought this was an elegantly simple solution.  +1 KAM

Agreed. +1 JDH
Comment 19 John Hardin 2011-03-26 19:48:46 UTC
(In reply to comment #15)
> Created an attachment (id=4859) [details]
> 
> Attached is a patch to BodyRuleBaseExtractor.pm which will
> make sa-compile report a warning when hitting a rule with
> tflag=multiple and a lossy match.

I restored the original problematic PILL_PRICE rules in my sandbox and ran this, and they did _not_ generate warnings. Some other unpromoted rules currently in my sandbox do generate warnings.
Comment 20 John Hardin 2011-03-27 12:43:33 UTC
(In reply to comment #19)
> (In reply to comment #15)
> > Created an attachment (id=4859) [details] [details]
> > 
> > Attached is a patch to BodyRuleBaseExtractor.pm which will
> > make sa-compile report a warning when hitting a rule with
> > tflag=multiple and a lossy match.
> 
> I restored the original problematic PILL_PRICE rules in my sandbox and ran
> this, and they did _not_ generate warnings. Some other unpromoted rules
> currently in my sandbox do generate warnings.

...that must have been a fluke. I ran sa-compile again and _1 and _2 are flagged, but _3 is not.
Comment 21 Kevin A. McGrail 2011-05-05 19:50:32 UTC
This has a patch from Mark and a vote from me and John.  This is ready to commit.  I do not believe it needs more votes.
Comment 22 John Hardin 2011-05-05 20:04:21 UTC
One caveat: sa-compile appears to stop issuing the warning when perl hits a version that no longer has the bug, so if your test platform is 5.12.3 (on Gentoo, at least) or higher you won't get warned about rules that may be problematic on older production installs.
Comment 23 Mark Martinec 2011-05-05 23:41:58 UTC
Created attachment 4875 [details]
A combo patch of what will go into the 3.3.2
Comment 24 Mark Martinec 2011-05-05 23:46:52 UTC
branch 3.3:
  Bug 6558: body rules having "tflags multiple" may cause infinite loop
  when compiled - a workaround
  Sending lib/Mail/SpamAssassin/Conf.pm
  Sending lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
  Sending lib/Mail/SpamAssassin/Plugin/OneLineBodyRuleType.pm
Committed revision 1100002.

Note this provides a workaround for the perl bug, a feature_bug6558_free
function, a small debugging improvement - but does not issue a warning
about possible problems with older versions, as this does not belong
to a 3.3.2 package.

Closing.