XMLBeans
  1. XMLBeans
  2. XMLBEANS-412

CLONE -Pattern facet regex requires dash - to be escaped

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: Version 2, Version 2.1, Version 2.2, Version 2.2.1, Version 2.3, Version 2.3.1
    • Fix Version/s: Version 2
    • Component/s: Validator
    • Labels:
      None
    • Environment:
      Win 2000, JDK1.5

      Description

      Given the following xsd that should allow only a valid email address pattern:
      <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

      <xsd:element name="Email" type="EmailType" />

      <xsd:simpleType name="EmailType" >
      <xsd:restriction base="xsd:token">
      <xsd:pattern value="([\.a-zA-Z0-9_-])@([a-zA-Z0-9_-])(([a-zA-Z0-9_-])*\.([a-zA-Z0-9_-]))"/>
      </xsd:restriction>
      </xsd:simpleType>

      </xsd:schema>

      Using the following simple xml instance:

      <Email>test@test.com</Email>

      Running:

      validate sample.xsd sample.xml

      generates:

      Schema invalid:
      D:\sample.xsd:7: error: pattern-regex: The regular expression '([\.a-zA-Z0-9_-])@([a-zA-Z0-9_-])(([a-zA-Z0-9_-])*\.([a-zA-Z0-9_-]))' is malformed: '-' is an invalid character range. Write '-'.

      A dash at the end or at the beginning of a character range does not have to be escaped (see http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#nt-charRange "The - character is a valid character range only at the beginning or end of a - positive character group- .")

      The regular expression in the email example is a valid xsd regexp and should be accepted by the XmlBeans validator.

      1. xmlbeans-412.patch
        2 kB
        Jerry Sy
      2. ParserForXMLSchema.java
        22 kB
        Jerry Sy

        Issue Links

          Activity

          Hide
          Jerry Sy added a comment -

          I have created a fix to this issue and requesting the board to review the fix. I am attaching the patch and changed file. It basically allows unescaped '-' at the start or end of a positive char group.

          svn diff ParserForXMLSchema.java
          Index: ParserForXMLSchema.java
          ===================================================================
          — ParserForXMLSchema.java (revision 1245727)
          +++ ParserForXMLSchema.java (working copy)
          @@ -234,9 +234,9 @@
          if (type == T_CHAR)

          { if (c == '[') throw this.ex("parser.cc.6", this.offset-2); if (c == ']') throw this.ex("parser.cc.7", this.offset-2); - // (radup) XMLSchema 1.0 allows the '-' as the first character of a range, - // but it looks like XMLSchema 1.1 will prohibit it - track this - if (c == '-' && !firstloop) throw this.ex("parser.cc.8", this.offset-2); + //https://issues.apache.org/jira/browse/XMLBEANS-412 + //unescaped single char '-' is a valid char after '[' and before ']' positive range only + if (c== '-' && ((!firstloop && this.chardata!=']') || nrange)) throw this.ex("parser.cc.8", this.offset-2); }

          if (this.read() != T_CHAR || this.chardata != '') { // Here is no ''.
          tok.addRange(c, c);
          @@ -245,9 +245,15 @@
          this.next(); // Skips '-'
          if ((type = this.read()) == T_EOF) throw this.ex("parser.cc.2", this.offset);
          // c '' ']' -> '' is a single-range.

          • if ((type == T_CHAR && this.chardata == ']')
          • type == T_XMLSCHEMA_CC_SUBTRACTION) {
            + if (type == T_XMLSCHEMA_CC_SUBTRACTION) { throw this.ex("parser.cc.8", this.offset-1); + }

            else if (type == T_CHAR && this.chardata == ']')

            Unknown macro: {+ //'-' occurs after a single-range but before ']'+ if (!nrange) { + tok.addRange(c,c); + tok.addRange('-','-'); + } else+ throw this.ex("parser.cc.8", this.offset-1); }

            else {
            int rangeend = this.chardata;
            if (type == T_CHAR) {

          Show
          Jerry Sy added a comment - I have created a fix to this issue and requesting the board to review the fix. I am attaching the patch and changed file. It basically allows unescaped '-' at the start or end of a positive char group. svn diff ParserForXMLSchema.java Index: ParserForXMLSchema.java =================================================================== — ParserForXMLSchema.java (revision 1245727) +++ ParserForXMLSchema.java (working copy) @@ -234,9 +234,9 @@ if (type == T_CHAR) { if (c == '[') throw this.ex("parser.cc.6", this.offset-2); if (c == ']') throw this.ex("parser.cc.7", this.offset-2); - // (radup) XMLSchema 1.0 allows the '-' as the first character of a range, - // but it looks like XMLSchema 1.1 will prohibit it - track this - if (c == '-' && !firstloop) throw this.ex("parser.cc.8", this.offset-2); + //https://issues.apache.org/jira/browse/XMLBEANS-412 + //unescaped single char '-' is a valid char after '[' and before ']' positive range only + if (c== '-' && ((!firstloop && this.chardata!=']') || nrange)) throw this.ex("parser.cc.8", this.offset-2); } if (this.read() != T_CHAR || this.chardata != ' ') { // Here is no ' '. tok.addRange(c, c); @@ -245,9 +245,15 @@ this.next(); // Skips '-' if ((type = this.read()) == T_EOF) throw this.ex("parser.cc.2", this.offset); // c ' ' ']' -> ' ' is a single-range. if ((type == T_CHAR && this.chardata == ']') type == T_XMLSCHEMA_CC_SUBTRACTION) { + if (type == T_XMLSCHEMA_CC_SUBTRACTION) { throw this.ex("parser.cc.8", this.offset-1); + } else if (type == T_CHAR && this.chardata == ']') Unknown macro: {+ //'-' occurs after a single-range but before ']'+ if (!nrange) { + tok.addRange(c,c); + tok.addRange('-','-'); + } else+ throw this.ex("parser.cc.8", this.offset-1); } else { int rangeend = this.chardata; if (type == T_CHAR) {
          Hide
          Jerry Sy added a comment -

          Suggested fix to allow unescaped '-' at start or end of a positive char group.

          Show
          Jerry Sy added a comment - Suggested fix to allow unescaped '-' at start or end of a positive char group.
          Hide
          Peter Ford added a comment -

          From my reading of http://xsd.stylusstudio.com/2007Jun/post07002.htm (which mentions bug 1889 at the w3.org Bugzilla), there might be some room for doubt on this matter.

          Show
          Peter Ford added a comment - From my reading of http://xsd.stylusstudio.com/2007Jun/post07002.htm (which mentions bug 1889 at the w3.org Bugzilla), there might be some room for doubt on this matter.
          Hide
          Peter Keller added a comment -

          For additional clarity (I hope):

          A positive char group is:

          posCharGroup ::= ( charRange | charClassEsc )+

          and '-' is a valid charClassEsc, so using '-' is valid via the charClassEsc branch. The following four positive character groups are all valid and equivalent:

          [A-F0-9.+-] ('-' is a valid charRange here)
          [-A-F0-9.+] ('-' is a valid charRange here)
          [A-F0-9.+\-] ('-' is a valid charClassEsc)
          [\-A-F0-9.+] ('-' is a valid charClassEsc)

          The following two negative character groups are valid and equivalent:

          [^A-F0-9.+\-] ('-' is a valid charClassEsc)
          [^\-A-F0-9.+] ('-' is a valid charClassEsc)

          and the following two are invalid:

          [^A-F0-9.+-] ('-' is not a valid charRange here)
          [^-A-F0-9.+] ('-' is not a valid charRange here)

          Show
          Peter Keller added a comment - For additional clarity (I hope): A positive char group is: posCharGroup ::= ( charRange | charClassEsc )+ and '-' is a valid charClassEsc, so using '-' is valid via the charClassEsc branch. The following four positive character groups are all valid and equivalent: [A-F0-9.+-] ('-' is a valid charRange here) [-A-F0-9.+] ('-' is a valid charRange here) [A-F0-9.+\-] ('-' is a valid charClassEsc) [\-A-F0-9.+] ('-' is a valid charClassEsc) The following two negative character groups are valid and equivalent: [^A-F0-9.+\-] ('-' is a valid charClassEsc) [^\-A-F0-9.+] ('-' is a valid charClassEsc) and the following two are invalid: [^A-F0-9.+-] ('-' is not a valid charRange here) [^-A-F0-9.+] ('-' is not a valid charRange here)
          Hide
          Julien HENRY added a comment -

          I can confirm this is a bug in xmlbeans.

          According to the spec, [A-F0-9.+-]* is a valid regular expression but xmlbean fails with:
          error: pattern-regex: The regular expression '[A-F0-9.+-]*' is malformed: '-' is an invalid character range. Write '-'.

          Show
          Julien HENRY added a comment - I can confirm this is a bug in xmlbeans. According to the spec, [A-F0-9.+-] * is a valid regular expression but xmlbean fails with: error: pattern-regex: The regular expression ' [A-F0-9.+-] *' is malformed: '-' is an invalid character range. Write '-'.
          Hide
          Radosław Ceszkiel added a comment -

          The bug XMLBEANS-224 was closed as resolved. It should not be, as the dash at the end or at the beginning of a character range does not have to be escaped (see http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#nt-charRange, "The - character is a valid character range only at the beginning or end of a - positive character group- .")

          The regular expression in the email example is a valid xsd regexp and should be accepted by the XmlBeans validator..

          Show
          Radosław Ceszkiel added a comment - The bug XMLBEANS-224 was closed as resolved. It should not be, as the dash at the end or at the beginning of a character range does not have to be escaped (see http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#nt-charRange , "The - character is a valid character range only at the beginning or end of a - positive character group- .") The regular expression in the email example is a valid xsd regexp and should be accepted by the XmlBeans validator..

            People

            • Assignee:
              Unassigned
              Reporter:
              Radosław Ceszkiel
            • Votes:
              3 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development