Bug 5056

Summary: [review] (?:(?<=[\\s,]))* matches null string many times in regex; marked by <-- HERE in m/\\G(?:(?<=[\\s,]))* <-- HERE \\Z/ at /usr/lib64/perl5/5.8.8/Text/Wrap.pm line 46.
Product: Spamassassin Reporter: Carlos Velasco <carlos.velasco>
Component: spamassassinAssignee: SpamAssassin Developer Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: lwilton, sa-user
Priority: P5    
Version: 3.1.4   
Target Milestone: 3.1.8   
Hardware: Other   
OS: Linux   
Whiteboard: ready to commit
Attachments: patch
remove Text::Wrap, replace with our own code
suggested patch

Description Carlos Velasco 2006-08-20 00:53:55 UTC
After upgrading to 3.1.4, this log is displayer for every mail processed:

Aug 20 02:51:17 mail:warning spamd: spamd[17424]: (?:(?<=[\\s,]))* matches null 
string many times in regex; marked by <-- HERE in m/\\G(?:(?<=[\\s,]))* <-- 
HERE \\Z/ at /usr/lib64/perl5/5.8.8/Text/Wrap.pm line 46. 

Wrap.pm version is:
$VERSION = 2006.0711;

spamd is running as:
spamd -du filter -g filter -H /home/filter -r /var/run/spamd.pid -m 5
Comment 1 Theo Van Dinter 2006-08-20 18:07:30 UTC
I believe this came up on the IRC channel a little while ago.  The issue is
actually with Text::Wrap.  A bug was opened with them via CPAN, wherein the
response was:

"The problem comes from an unintended use of a feature."
        - http://rt.cpan.org/Public/Bug/Display.html?id=20657

I think this may only happen with new versions of Text::Wrap, so you may want to
try downgrading.  I'm apparently running an old version (2001.09292) which works
fine.
Comment 2 Carlos Velasco 2006-08-20 18:30:40 UTC
Downgraded to:
Text-Tabs+Wrap-2006.0705

The warning does not appear anymore.
Comment 3 Theo Van Dinter 2006-09-10 17:54:38 UTC
Since the bug is not in SA, I'm closing this as WFM.
Comment 4 Justin Mason 2006-11-14 14:20:38 UTC
Created attachment 3751 [details]
patch

here's a trivial patch to SA which silences the warning. it's
already in trunk.
this kind of thing would put you off CPAN...
Comment 5 Justin Mason 2006-11-14 14:21:16 UTC
might as well put it into 3.1.x?
Comment 6 Daryl C. W. O'Shea 2006-12-08 09:43:09 UTC
+1
Comment 7 Theo Van Dinter 2006-12-08 11:35:09 UTC
Sure, why not?  +1
Comment 8 Daryl C. W. O'Shea 2006-12-11 11:42:35 UTC
[dos@FC5-VPC 3.1]$ svn ci -m "bug 5052: silence buggy crap from Text::Wrap"
Sending        lib/Mail/SpamAssassin/Util.pm
Transmitting file data .
Committed revision 485839.
Comment 9 linda w 2006-12-25 21:43:12 UTC
Please don't beat me.... :-)

It looks to me like this is a bug in SpamAssassin.

While Text::Wrap did change, the change uncovered an SA bug.

The pattern '(?<=[\s,])' (Passed in the line:

./lib/Mail/SpamAssassin/PerMsgStatus.pm:996:      $hdr =
Mail::SpamAssassin::Util::wrap($hdr, "\t", "", 79, 0, '(?<=[\s,])'); 

 is used with a *.  The pattern is a zero-width assertion.  
How many times should a zero width assertion match when used
with the "*" operator (meaning repeat previous grouping until false).

What may have been meant here was simply '[\s,]'.
The pattern gets used as the "breaking" delimiter pattern to split lines.
It defaults to '\s' in the code and was _hardcoded_ as '\s' in the 
prior version that worked.

-linda
Comment 10 Theo Van Dinter 2006-12-25 22:46:05 UTC
(In reply to comment #9)
> Please don't beat me.... :-)

:)

> While Text::Wrap did change, the change uncovered an SA bug.

I disagree, please see below.

> The pattern '(?<=[\s,])' (Passed in the line:
> ./lib/Mail/SpamAssassin/PerMsgStatus.pm:996:      $hdr =
> Mail::SpamAssassin::Util::wrap($hdr, "\t", "", 79, 0, '(?<=[\s,])'); 
> 
> is used with a *.

Due to a change in Text::Wrap, yes.

> What may have been meant here was simply '[\s,]'.

Nope.  Originally that's what it was.  However, it caused bug 2165.  The
solution (r5605 if anyone's interested) was to use the zero width assertion
which avoided removing the break character.

Unless I'm missing something in the Text::Wrap docs, there's no other way to say
"break on this character, but don't remove it from the string".
Comment 11 Theo Van Dinter 2006-12-25 22:46:45 UTC
*** Bug 5253 has been marked as a duplicate of this bug. ***
Comment 12 linda w 2006-12-26 17:39:51 UTC
Theo, kindly keeper of the assassinator of spam wrote:
> Linda, "corruptor" of programs, and occasional Pain in the *** wrote:
>> While Text::Wrap did change, the change uncovered an SA bug.

T> I disagree, please see below.

>> The pattern '(?<=[\s,])' (Passed in the line:
>> ./lib/Mail/SpamAssassin/PerMsgStatus.pm:996:      $hdr =
>> Mail::SpamAssassin::Util::wrap($hdr, "\t", "", 79, 0, '(?<=[\s,])');
>> is used with a *.

T> Due to a change in Text::Wrap, yes.

>> What may have been meant here was simply '[\s,]'.

T> Nope.  Originally that's what it was.  However, it caused bug 2165.  The
T> solution (r5605 if anyone's interested) was to use the zero width assertion
T> which avoided removing the break character.
---
    I was afraid it was something like that.  But....


T> Unless I'm missing something in the Text::Wrap docs, there's no other way to say
T> "break on this character, but don't remove it from the string".
---
    Line::Break modules states:

    "It is possible to control which characters terminate words by modifying
    $Text::Wrap::break. Set this to a string such as '[\s:]' (to break
    before spaces or colons) or a pre-compiled regexp such as "qr/[\s']/"
    (to break before spaces or apostrophes). The default is simply '\s';
    that is, words are terminated by spaces."

    Unless I read it improperly, "$break" must contain characters.
A zero width assertion doesn't (as much as it would be interesting) 
qualify as a character in any character set. :-)

    Having it, both correspond to the break character, and having the code
swallow extra break characters makes sense, in the general scheme of things: the
code would look for a break character "break character" when it is time to
"break" by using '(:$break)*'.  That would swallow up extra break characters
after a printing line.

    Looking at the code for Text::Wrap (TW), I don't believe it was designed
to accept a zero-width pattern.  By not passing in a match for 1 or more
characters, I believe SA is calling Text::Width with illegal parameters.

    No matter if it is decided to maintain "SA"'s call to TW with an invalid
parameter or if it is fixed, the patch hiding a *valid* perl warning is bad form.

    Why is "," needed as a breaking character?  Would it work (i.e. not cause
bug 2165) if you just used '\s'?

    Regarding the warning, it is a valid warning about broken/bad perl code. 
The broken code:
'(zero width assertion)<unbounded wildcard>' is invalid / undefined perl code.
I wouldn't be certain what it would do -- I suppose we should feel fortunate
that perl is smart enough to catch it -- since when trying to match a pattern, a
pattern like '(zero-width-pat)*' should repeat indefinitely.

    If SA needs to "fork" the Text::Wrap code from 0705 to solve SA's
need, that would be one "valid" solution, but hiding away valid warnings about
invalid perl regular expressions or "useless use of infinitely-repeating
zero-width assertion" is just hiding a bug and asking for trouble.  

Maybe Text::Wrap does need to be forked to provide the, functionality that SA
needs?  

But please, don't shut off a valid warning.  Anyone linking with TW after the
0705 version will silently be running the bad perl code.  

It is technically experimental, but its been there, I believe since 5.8.1,
maybe the construct '([\s,]+)(?{$mybreakchars=$^N})' would provide a workaround?
You'd still have the "\s," pattern, and for any given invocation of TW, you'd
save the value of the break char such that it could be inserted back in? 
If Text::Wrap might execute that line multiple times, maybe:
'([\s,]+)(?{push @mybreakchars,$^N})' would allow pushing on successive 
break characters for each time it's used.  

It sounds potentially ugly/messy.  Is it possible just to ask for zero-width
assertion "support" in the Text::Width function?  The code in 0705 was
technically "broken", in that it didn't properly use the value in "$break" as
the value to break-on -- it used '\s'.  It still would have swallowed the break
char though.

*sigh*


-l


    
Comment 13 Theo Van Dinter 2006-12-27 22:21:09 UTC
Ok, I'm going to reopen this ticket.

Why not just drop Text::Wrap and do it ourselves?  It's not like it's a huge
amount of code, and we only make 3 calls in PerMsgStatus and nowhere else, so
the input and expected results are pretty straightforward.  We don't need (nor
do we want) to fork Text::Wrap ala comment 12.

I have a version that I'm testing and will put it into 3.2 soon.  Assuming no
major issues, we can look at backporting to 3.1.
Comment 14 Theo Van Dinter 2006-12-27 23:09:01 UTC
Created attachment 3799 [details]
remove Text::Wrap, replace with our own code

Ok, here's essentially what I put into 3.2 in r490670, with a slight difference
to the Makefile.PL diff to apply to 3.1.
Comment 15 Justin Mason 2006-12-28 02:45:18 UTC
+1 to dropping Text::Wrap. thanks Theo!
Comment 16 Theo Van Dinter 2007-01-03 17:03:52 UTC
not sure why this got closed, we should backport this to 3.1 as well to avoid
the issue there.

fwiw, just backport the Util::wrap() from 3.2 which has had a bug-fix or two
since the patch I posted here.
Comment 17 Theo Van Dinter 2007-01-31 19:53:21 UTC
backport from 3.2 patch coming shortly
Comment 18 Theo Van Dinter 2007-01-31 19:54:07 UTC
Created attachment 3851 [details]
suggested patch

backport from 3.2
Comment 19 Michael Parker 2007-01-31 20:43:43 UTC
+1
Comment 20 Justin Mason 2007-02-02 03:50:57 UTC
+1
Comment 21 Theo Van Dinter 2007-02-04 11:36:43 UTC
Committed.  Found that the patch didn't remove the remains of Text::Wrap, so I
took those lines out manually.

Sending        Makefile.PL
Sending        lib/Mail/SpamAssassin/Util.pm
Transmitting file data ..
Committed revision 503457.