|
SA Bugzilla – Full Text Bug Listing |
Summary: | [review] (?:(?<=[\\s,]))* matches null string many times in regex; marked by <-- HERE in m/\\G(?:(?<=[\\s,]))* <-- HERE \\Z/ at /usr/lib64/perl5/5.8.8/Text/Wrap.pm line 46. | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Carlos Velasco <carlos.velasco> |
Component: | spamassassin | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | lwilton, sa-user |
Priority: | P5 | ||
Version: | 3.1.4 | ||
Target Milestone: | 3.1.8 | ||
Hardware: | Other | ||
OS: | Linux | ||
Whiteboard: | ready to commit | ||
Attachments: |
patch
remove Text::Wrap, replace with our own code suggested patch |
Description
Carlos Velasco
2006-08-20 00:53:55 UTC
I believe this came up on the IRC channel a little while ago. The issue is actually with Text::Wrap. A bug was opened with them via CPAN, wherein the response was: "The problem comes from an unintended use of a feature." - http://rt.cpan.org/Public/Bug/Display.html?id=20657 I think this may only happen with new versions of Text::Wrap, so you may want to try downgrading. I'm apparently running an old version (2001.09292) which works fine. Downgraded to: Text-Tabs+Wrap-2006.0705 The warning does not appear anymore. Since the bug is not in SA, I'm closing this as WFM. Created attachment 3751 [details]
patch
here's a trivial patch to SA which silences the warning. it's
already in trunk.
this kind of thing would put you off CPAN...
might as well put it into 3.1.x? +1 Sure, why not? +1 [dos@FC5-VPC 3.1]$ svn ci -m "bug 5052: silence buggy crap from Text::Wrap" Sending lib/Mail/SpamAssassin/Util.pm Transmitting file data . Committed revision 485839. Please don't beat me.... :-) It looks to me like this is a bug in SpamAssassin. While Text::Wrap did change, the change uncovered an SA bug. The pattern '(?<=[\s,])' (Passed in the line: ./lib/Mail/SpamAssassin/PerMsgStatus.pm:996: $hdr = Mail::SpamAssassin::Util::wrap($hdr, "\t", "", 79, 0, '(?<=[\s,])'); is used with a *. The pattern is a zero-width assertion. How many times should a zero width assertion match when used with the "*" operator (meaning repeat previous grouping until false). What may have been meant here was simply '[\s,]'. The pattern gets used as the "breaking" delimiter pattern to split lines. It defaults to '\s' in the code and was _hardcoded_ as '\s' in the prior version that worked. -linda (In reply to comment #9) > Please don't beat me.... :-) :) > While Text::Wrap did change, the change uncovered an SA bug. I disagree, please see below. > The pattern '(?<=[\s,])' (Passed in the line: > ./lib/Mail/SpamAssassin/PerMsgStatus.pm:996: $hdr = > Mail::SpamAssassin::Util::wrap($hdr, "\t", "", 79, 0, '(?<=[\s,])'); > > is used with a *. Due to a change in Text::Wrap, yes. > What may have been meant here was simply '[\s,]'. Nope. Originally that's what it was. However, it caused bug 2165. The solution (r5605 if anyone's interested) was to use the zero width assertion which avoided removing the break character. Unless I'm missing something in the Text::Wrap docs, there's no other way to say "break on this character, but don't remove it from the string". *** Bug 5253 has been marked as a duplicate of this bug. *** Theo, kindly keeper of the assassinator of spam wrote: > Linda, "corruptor" of programs, and occasional Pain in the *** wrote: >> While Text::Wrap did change, the change uncovered an SA bug. T> I disagree, please see below. >> The pattern '(?<=[\s,])' (Passed in the line: >> ./lib/Mail/SpamAssassin/PerMsgStatus.pm:996: $hdr = >> Mail::SpamAssassin::Util::wrap($hdr, "\t", "", 79, 0, '(?<=[\s,])'); >> is used with a *. T> Due to a change in Text::Wrap, yes. >> What may have been meant here was simply '[\s,]'. T> Nope. Originally that's what it was. However, it caused bug 2165. The T> solution (r5605 if anyone's interested) was to use the zero width assertion T> which avoided removing the break character. --- I was afraid it was something like that. But.... T> Unless I'm missing something in the Text::Wrap docs, there's no other way to say T> "break on this character, but don't remove it from the string". --- Line::Break modules states: "It is possible to control which characters terminate words by modifying $Text::Wrap::break. Set this to a string such as '[\s:]' (to break before spaces or colons) or a pre-compiled regexp such as "qr/[\s']/" (to break before spaces or apostrophes). The default is simply '\s'; that is, words are terminated by spaces." Unless I read it improperly, "$break" must contain characters. A zero width assertion doesn't (as much as it would be interesting) qualify as a character in any character set. :-) Having it, both correspond to the break character, and having the code swallow extra break characters makes sense, in the general scheme of things: the code would look for a break character "break character" when it is time to "break" by using '(:$break)*'. That would swallow up extra break characters after a printing line. Looking at the code for Text::Wrap (TW), I don't believe it was designed to accept a zero-width pattern. By not passing in a match for 1 or more characters, I believe SA is calling Text::Width with illegal parameters. No matter if it is decided to maintain "SA"'s call to TW with an invalid parameter or if it is fixed, the patch hiding a *valid* perl warning is bad form. Why is "," needed as a breaking character? Would it work (i.e. not cause bug 2165) if you just used '\s'? Regarding the warning, it is a valid warning about broken/bad perl code. The broken code: '(zero width assertion)<unbounded wildcard>' is invalid / undefined perl code. I wouldn't be certain what it would do -- I suppose we should feel fortunate that perl is smart enough to catch it -- since when trying to match a pattern, a pattern like '(zero-width-pat)*' should repeat indefinitely. If SA needs to "fork" the Text::Wrap code from 0705 to solve SA's need, that would be one "valid" solution, but hiding away valid warnings about invalid perl regular expressions or "useless use of infinitely-repeating zero-width assertion" is just hiding a bug and asking for trouble. Maybe Text::Wrap does need to be forked to provide the, functionality that SA needs? But please, don't shut off a valid warning. Anyone linking with TW after the 0705 version will silently be running the bad perl code. It is technically experimental, but its been there, I believe since 5.8.1, maybe the construct '([\s,]+)(?{$mybreakchars=$^N})' would provide a workaround? You'd still have the "\s," pattern, and for any given invocation of TW, you'd save the value of the break char such that it could be inserted back in? If Text::Wrap might execute that line multiple times, maybe: '([\s,]+)(?{push @mybreakchars,$^N})' would allow pushing on successive break characters for each time it's used. It sounds potentially ugly/messy. Is it possible just to ask for zero-width assertion "support" in the Text::Width function? The code in 0705 was technically "broken", in that it didn't properly use the value in "$break" as the value to break-on -- it used '\s'. It still would have swallowed the break char though. *sigh* -l Ok, I'm going to reopen this ticket. Why not just drop Text::Wrap and do it ourselves? It's not like it's a huge amount of code, and we only make 3 calls in PerMsgStatus and nowhere else, so the input and expected results are pretty straightforward. We don't need (nor do we want) to fork Text::Wrap ala comment 12. I have a version that I'm testing and will put it into 3.2 soon. Assuming no major issues, we can look at backporting to 3.1. Created attachment 3799 [details] remove Text::Wrap, replace with our own code Ok, here's essentially what I put into 3.2 in r490670, with a slight difference to the Makefile.PL diff to apply to 3.1. +1 to dropping Text::Wrap. thanks Theo! not sure why this got closed, we should backport this to 3.1 as well to avoid the issue there. fwiw, just backport the Util::wrap() from 3.2 which has had a bug-fix or two since the patch I posted here. backport from 3.2 patch coming shortly Created attachment 3851 [details]
suggested patch
backport from 3.2
+1 +1 Committed. Found that the patch didn't remove the remains of Text::Wrap, so I took those lines out manually. Sending Makefile.PL Sending lib/Mail/SpamAssassin/Util.pm Transmitting file data .. Committed revision 503457. |