ASF Bugzilla – Bug 3877

{n} and {n,m} not thread safe

Last modified: 2005-08-10 21:13:14 UTC

In a multi-threaded environment, the REProgram produced by RECompiler for a regexp containing {n} or {n,m} is unreliable. Class RECompiler uses static variables and arrays to process the {n} and {n,m} specifications. These static variables and arrays are being modified without any synchronization between threads. The variables in question are: static int brackets = 0; // Number of bracket sets static int[] bracketStart = null; // Starting point static int[] bracketEnd = null; // Ending point static int[] bracketMin = null; // Minimum number of matches static int[] bracketOpt = null; // Additional optional matches

I fixed the problem in the compiler by changing the method void bracket()... /** * Match bracket {m,n} expression put results in bracket member variables * @exception RESyntaxException Thrown if the regular expression has invalid syntax. */ void bracket() throws RESyntaxException { // Current character must be a '{' if (idx >= len || pattern.charAt(idx++) != '{') { internalError(); } // Next char must be a digit if (idx >= len || !Character.isDigit(pattern.charAt(idx))) { syntaxError("Expected digit"); } // Get min ('m' of {m,n}) number StringBuffer number = new StringBuffer(); while (idx < len && Character.isDigit(pattern.charAt(idx))) { number.append(pattern.charAt(idx++)); } try { bracketMin[brackets] = Integer.parseInt(number.toString()); } catch (NumberFormatException e) { syntaxError("Expected valid number"); } // If out of input, fail if (idx >= len) { syntaxError("Expected comma or right bracket"); } // If end of expr, optional limit is 0 if (pattern.charAt(idx) == '}') { if (bracketMin[brackets] < 1) { syntaxError("Bad zero range"); } idx++; bracketOpt[brackets] = 0; return; } // Must have at least {m,} and maybe {m,n}. if (idx >= len || pattern.charAt(idx++) != ',') { syntaxError("Expected comma"); } // If out of input, fail if (idx >= len) { syntaxError("Expected comma or right bracket"); } // If {m,} max is unlimited if (pattern.charAt(idx) == '}') { idx++; bracketOpt[brackets] = bracketUnbounded; return; } // Next char must be a digit if (idx >= len || !Character.isDigit(pattern.charAt(idx))) { syntaxError("Expected digit"); } // Get max number number.setLength(0); while (idx < len && Character.isDigit(pattern.charAt(idx))) { number.append(pattern.charAt(idx++)); } try { bracketOpt[brackets] = Integer.parseInt(number.toString()) - bracketMin[brackets]; /**/ if (bracketMin[brackets] < 1) bracketOpt[brackets]--; /**/ } catch (NumberFormatException e) { syntaxError("Expected valid number"); } // Optional repetitions must be > 0 /* if (bracketOpt[brackets] <= 0) */ if (bracketOpt[brackets] < 0) { syntaxError("Bad range"); } // Must have close brace if (idx >= len || pattern.charAt(idx++) != '}') { syntaxError("Missing close brace"); } }

Alas, the above suggestion won't help my problem. The problem that I am having is occurring with a specification of "{9}". It is unrelated to the use of a lower limit of zero. The behavior is also not predictable, as it depends on timing of multiple threads (in this case, within a servlet). Most of the time it works fine, but every now and again... One thread is wiping out the values of bracket, bracketMin, bracketOpt, etc., which are simultaneously being used by another thread. The source of the problem is that these are static variables, not member variables.

i believe that all of regexp isn't synchronized. you need to externally sync it.

External synchronization works great in the usual case where an *instance* cannot safely be shared between threads. In this case, however, the *class* cannot safely be shared between threads. This presents an unexpected and much messier proposition. In general, we do expect to be able to create more than one instance of a class at a time.

I was having the same problem while stress testing a new applications. Since it seemed a striaght foward issue of instances using static variables as a scratch pad and I had a good reproducable test. I decided to just change the variables from static to non-static as shown in the patch below and run the tests. All the errors went away and regexp seems to still do everything we expect of it with no loss of performance. 101,105c102,106 < static int brackets = 0; // Number of bracket sets < static int[] bracketStart = null; // Starting point < static int[] bracketEnd = null; // Ending point < static int[] bracketMin = null; // Minimum number of matches < static int[] bracketOpt = null; // Additional optional matches --- > int brackets = 0; // Number of bracket sets > int[] bracketStart = null; // Starting point > int[] bracketEnd = null; // Ending point > int[] bracketMin = null; // Minimum number of matches > int[] bracketOpt = null; // Additional optional matches

I have occured the same question, and I removed the "static" keyword as above, it works until now(over 8 months). I wonder why the valiable is "static" previously? does it *should* be there?

removed static keywords

closed.

*** Bug 33051 has been marked as a duplicate of this bug. ***

*** Bug 34548 has been marked as a duplicate of this bug. ***