Bug 3879

Summary: Expressions using {0,n} match 0 to n+1 times instead of 0 to n times
Product: Regexp Reporter: Chris Scheuble <chriss>
Component: OtherAssignee: Jakarta Notifications Mailing List <notifications>
Status: CLOSED DUPLICATE    
Severity: normal    
Priority: P3    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: All   

Description Chris Scheuble 2001-09-28 15:50:36 UTC
Expressions using {0,n} match 0 to n+1 times instead of 0 to n times.

Expression "[a-z]{0,3}" against "123abcdefg123" matches "abcd" not "abc".

I fixed the problem in the compiler by changing the method void bracket()...

    /**
     * Match bracket {m,n} expression put results in bracket member variables
     * @exception RESyntaxException Thrown if the regular expression has 
invalid syntax.
     */
    void bracket() throws RESyntaxException
    {
        // Current character must be a '{'
        if (idx >= len || pattern.charAt(idx++) != '{')
        {
            internalError();
        }

        // Next char must be a digit
        if (idx >= len || !Character.isDigit(pattern.charAt(idx)))
        {
            syntaxError("Expected digit");
        }

        // Get min ('m' of {m,n}) number
        StringBuffer number = new StringBuffer();
        while (idx < len && Character.isDigit(pattern.charAt(idx)))
        {
            number.append(pattern.charAt(idx++));
        }
        try
        {
            bracketMin[brackets] = Integer.parseInt(number.toString());
        }
        catch (NumberFormatException e)
        {
            syntaxError("Expected valid number");
        }

        // If out of input, fail
        if (idx >= len)
        {
            syntaxError("Expected comma or right bracket");
        }

        // If end of expr, optional limit is 0
        if (pattern.charAt(idx) == '}')
        {
            if (bracketMin[brackets] < 1)
            {
                syntaxError("Bad zero range");
            }

            idx++;
            bracketOpt[brackets] = 0;
            return;
        }

        // Must have at least {m,} and maybe {m,n}.
        if (idx >= len || pattern.charAt(idx++) != ',')
        {
            syntaxError("Expected comma");
        }

        // If out of input, fail
        if (idx >= len)
        {
            syntaxError("Expected comma or right bracket");
        }

        // If {m,} max is unlimited
        if (pattern.charAt(idx) == '}')
        {
            idx++;
            bracketOpt[brackets] = bracketUnbounded;
            return;
        }

        // Next char must be a digit
        if (idx >= len || !Character.isDigit(pattern.charAt(idx)))
        {
            syntaxError("Expected digit");
        }

        // Get max number
        number.setLength(0);
        while (idx < len && Character.isDigit(pattern.charAt(idx)))
        {
            number.append(pattern.charAt(idx++));
        }
        try
        {
            bracketOpt[brackets] = Integer.parseInt(number.toString()) - 
bracketMin[brackets];
/**/
            if (bracketMin[brackets] < 1)
                bracketOpt[brackets]--;
/**/
        }
        catch (NumberFormatException e)
        {
            syntaxError("Expected valid number");
        }

        // Optional repetitions must be > 0
/*
        if (bracketOpt[brackets] <= 0)
*/
        if (bracketOpt[brackets] < 0)
        {
            syntaxError("Bad range");
        }

        // Must have close brace
        if (idx >= len || pattern.charAt(idx++) != '}')
        {
            syntaxError("Missing close brace");
        }
    }
Comment 1 Jon Stevens 2002-12-13 18:42:32 UTC
patches applied and tested
Comment 2 Jon Stevens 2002-12-13 18:42:45 UTC
closed
Comment 3 Vadim Gritsenko 2003-04-24 19:29:30 UTC
I belive this is not correct solution; testcase #174 output must be '', and Perl
agrees with me:

#!/usr/bin/perl
print "Matching '123abcdefg123' with regexp '([a-z]{0,3})':\n";
if ("123abcdefg123" =~ /([a-z]{0,3})/) {
    print "Matches. Result: '$1'\n";
}

Output:
Matching '123abcdefg123' with regexp '([a-z]{0,3})':
Matches. Result: ''

Patch will follow...
Comment 4 Vadim Gritsenko 2003-04-25 17:55:52 UTC

*** This bug has been marked as a duplicate of 19329 ***
Comment 5 Vadim Gritsenko 2003-05-02 01:09:33 UTC
Fixed by Bug #19329