Uploaded image for project: 'Xerces2-J'
  1. Xerces2-J
  2. XERCESJ-1126

Regular Expressions starting with "." in non-XML Schema mode use wrong first character optimization

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.1
    • 2.8.1
    • None

    Description

      The following Java snippet prints "not matched", but should print "matched".

      RegularExpression regex = new RegularExpression(".oo", "");
      if (regex.matches("foo")) System.out.println("matched");
      else System.out.println("not matched");

      It uses the class org.apache.xerces.impl.xpath.regex.RegularExpression.java. I believe this happens because of the first character optimization kicking in and checks for a first character of "o", does not match the 'f' and then consequently returns false. This may be caused by this code snippet from Token.java:493

      case DOT: // ****
      if (isSet(options, RegularExpression.SINGLE_LINE))

      { return FC_CONTINUE; // **** We can not optimize. }

      else

      { return FC_CONTINUE; /* * result.addRange(0, RegularExpression.LINE_FEED-1); * result.addRange(RegularExpression.LINE_FEED+1, * RegularExpression.CARRIAGE_RETURN-1); * result.addRange(RegularExpression.CARRIAGE_RETURN+1, * RegularExpression.LINE_SEPARATOR-1); * result.addRange(RegularExpression.PARAGRAPH_SEPARATOR+1, UTF16_MAX); * return 1; */ }

      I think it should unconditionally return FC_ANY for DOT, at least in the case of a starting '.'

      Attachments

        Activity

          People

            mrglavas@ca.ibm.com Michael Glavassevich
            martinp Martin Probst
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: