Bug 3877 - {n} and {n,m} not thread safe
{n} and {n,m} not thread safe
Status: CLOSED FIXED
Product: Regexp
Classification: Unclassified
Component: Other
unspecified
PC All
: P3 normal (vote)
: ---
Assigned To: Jakarta Notifications Mailing List
:
: 33051 34548 (view as bug list)
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2001-09-28 14:43 UTC by Doug Pardee
Modified: 2005-08-10 21:13 UTC (History)
2 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Doug Pardee 2001-09-28 14:43:01 UTC
In a multi-threaded environment, the REProgram produced by RECompiler for a 
regexp containing {n} or {n,m} is unreliable.

Class RECompiler uses static variables and arrays to process the {n} and {n,m} 
specifications. These static variables and arrays are being modified without 
any synchronization between threads.

The variables in question are:

    static int brackets = 0;                    // Number of bracket sets
    static int[] bracketStart = null;           // Starting point
    static int[] bracketEnd = null;             // Ending point
    static int[] bracketMin = null;             // Minimum number of matches
    static int[] bracketOpt = null;             // Additional optional matches
Comment 1 Chris Scheuble 2001-09-28 14:53:38 UTC
I fixed the problem in the compiler by changing the method void bracket()...

    /**
     * Match bracket {m,n} expression put results in bracket member variables
     * @exception RESyntaxException Thrown if the regular expression has 
invalid syntax.
     */
    void bracket() throws RESyntaxException
    {
        // Current character must be a '{'
        if (idx >= len || pattern.charAt(idx++) != '{')
        {
            internalError();
        }

        // Next char must be a digit
        if (idx >= len || !Character.isDigit(pattern.charAt(idx)))
        {
            syntaxError("Expected digit");
        }

        // Get min ('m' of {m,n}) number
        StringBuffer number = new StringBuffer();
        while (idx < len && Character.isDigit(pattern.charAt(idx)))
        {
            number.append(pattern.charAt(idx++));
        }
        try
        {
            bracketMin[brackets] = Integer.parseInt(number.toString());
        }
        catch (NumberFormatException e)
        {
            syntaxError("Expected valid number");
        }

        // If out of input, fail
        if (idx >= len)
        {
            syntaxError("Expected comma or right bracket");
        }

        // If end of expr, optional limit is 0
        if (pattern.charAt(idx) == '}')
        {
            if (bracketMin[brackets] < 1)
            {
                syntaxError("Bad zero range");
            }

            idx++;
            bracketOpt[brackets] = 0;
            return;
        }

        // Must have at least {m,} and maybe {m,n}.
        if (idx >= len || pattern.charAt(idx++) != ',')
        {
            syntaxError("Expected comma");
        }

        // If out of input, fail
        if (idx >= len)
        {
            syntaxError("Expected comma or right bracket");
        }

        // If {m,} max is unlimited
        if (pattern.charAt(idx) == '}')
        {
            idx++;
            bracketOpt[brackets] = bracketUnbounded;
            return;
        }

        // Next char must be a digit
        if (idx >= len || !Character.isDigit(pattern.charAt(idx)))
        {
            syntaxError("Expected digit");
        }

        // Get max number
        number.setLength(0);
        while (idx < len && Character.isDigit(pattern.charAt(idx)))
        {
            number.append(pattern.charAt(idx++));
        }
        try
        {
            bracketOpt[brackets] = Integer.parseInt(number.toString()) - 
bracketMin[brackets];
/**/
            if (bracketMin[brackets] < 1)
                bracketOpt[brackets]--;
/**/
        }
        catch (NumberFormatException e)
        {
            syntaxError("Expected valid number");
        }

        // Optional repetitions must be > 0
/*
        if (bracketOpt[brackets] <= 0)
*/
        if (bracketOpt[brackets] < 0)
        {
            syntaxError("Bad range");
        }

        // Must have close brace
        if (idx >= len || pattern.charAt(idx++) != '}')
        {
            syntaxError("Missing close brace");
        }
    }

Comment 2 Doug Pardee 2001-09-28 15:22:01 UTC
Alas, the above suggestion won't help my problem.

The problem that I am having is occurring with a specification of "{9}". It is 
unrelated to the use of a lower limit of zero. The behavior is also not 
predictable, as it depends on timing of multiple threads (in this case, within 
a servlet). Most of the time it works fine, but every now and again...

One thread is wiping out the values of bracket, bracketMin, bracketOpt, etc., 
which are simultaneously being used by another thread. The source of the 
problem is that these are static variables, not member variables.
Comment 3 Jon Stevens 2001-09-28 15:43:31 UTC
i believe that all of regexp isn't synchronized. you need to externally sync it.
Comment 4 Doug Pardee 2001-09-28 16:05:28 UTC
External synchronization works great in the usual case where an *instance* 
cannot safely be shared between threads. In this case, however, the *class* 
cannot safely be shared between threads. This presents an unexpected and much 
messier proposition. In general, we do expect to be able to create more than 
one instance of a class at a time.
Comment 5 Keith Gross 2002-08-30 15:12:25 UTC
I was having the same problem while stress testing a new applications.  Since it
seemed a striaght foward issue of instances using static variables as a scratch
pad and I had a good reproducable test. I decided to just change the variables
from static to non-static as shown in the patch below and run the tests.  All
the errors went away and regexp seems to still do everything we expect of it
with no loss of performance.

101,105c102,106
<     static int brackets = 0;                            // Number of bracket sets
<     static int[] bracketStart = null;                   // Starting point
<     static int[] bracketEnd = null;                     // Ending point
<     static int[] bracketMin = null;                     // Minimum number of
matches
<     static int[] bracketOpt = null;                     // Additional optional
matches
---
>     int brackets = 0;                            // Number of bracket sets
>     int[] bracketStart = null;                   // Starting point
>     int[] bracketEnd = null;                     // Ending point
>     int[] bracketMin = null;                     // Minimum number of matches
>     int[] bracketOpt = null;                     // Additional optional matches
Comment 6 liu lianhui 2002-12-13 08:47:18 UTC
I have occured the same question, and I removed the "static" keyword as above,
it works until 
now(over 8 months).
I wonder why the valiable is "static" previously? does it *should* be 
there?
Comment 7 Jon Stevens 2002-12-13 17:50:57 UTC
removed static keywords
Comment 8 Jon Stevens 2002-12-13 17:51:13 UTC
closed.
Comment 9 Vadim Gritsenko 2005-01-11 22:34:19 UTC
*** Bug 33051 has been marked as a duplicate of this bug. ***
Comment 10 Vadim Gritsenko 2005-08-11 05:13:14 UTC
*** Bug 34548 has been marked as a duplicate of this bug. ***