Uploaded image for project: 'Xerces2-J'
  1. Xerces2-J
  2. XERCESJ-1377

Thread safety problem in RegularExpression

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.9.0, 2.9.1
    • 2.10.0
    • None

    Description

      String which should have matched pattern was incorrectly reported as not matching. On re-submission the same string was correctly reported as matching. This has now happened twice in the first few daysof use of a system processing around 500K messages per day, i.e., problem is highly intermittent.

      The symptoms appear to indicate a thread safety problem and although I had understood RegularExpression to be thread-safe, looking at the code it does seem to be wrong.

      The application in question is using 2.9.0 but looking at RegularExpression.java in 2.9.1 the algorithm appears the same and is described below.

      The thread safety algorithm depends on a separate Context object being allocated on the stack if the Context object referenced by RegularExpression's instance variable is "inuse". For example, line 1420.

      synchronized (this.context)

      { con = this.context.inuse ? new Context() : this.context; con.reset(target, start, end, this.numberOfClosures); }

      The "inuse" boolean is set to true by the reset method inside the synchronized code section above, however it is set to false in a non-synchronized section, prior to each return point, e.g., at line 1449.

      con.inuse = false;
      return true;

      The "inuse" boolean is not declared as volatile, and so I believe the absence of synchronization is wrong and makes this class NOT thread safe.

      E.g., it is vulnerable to what the JLS second edition called a "Prescient Store" optimisation taking place, which could explain the behaviour I am seeing - the inuse of this.context being set to false earlier than would be expected could lead to concurrent use of a Context object which is not thread safe. Since all return points from methods like "matches" do set "inuse" to false, a "prescient store" optimisation to set it to false before actually performing the match is quite plausible.

      Although the term "prescient store" is not used in the JLS third edition I believe the semantics described there for non-volatile field access in non-synchronized code regions still allow this possibility of an optimisation re-ordering the clearing of the inuse flag so that it happens BEFORE the actual use of the Context object.

      In terms of a solution, one possibility is to declare "inuse" as volatile, another is to use a synchronized "setInUse" method on the Context.

      A third possibility would be to dispense with the approach of re-using Context objects via an instance variable reference, and always allocate a Context on the stack. I also note that if this was done, and if the "prepare" method was not invoked lazily on the first match but was invoked up front as part of setting the pattern, there would be no need for any kind of synchronization within the "matches" methods. This could give the optimum for heavy concurrent use of a common pattern in highly-multithreaded environment, but of course has trade-offs in other regards.

      Attachments

        Activity

          People

            mrglavas@ca.ibm.com Michael Glavassevich
            peter.geraghty Peter Geraghty
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: