Issue Details (XML | Word | Printable)

Key: XERCESJ-1060
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Julian Cable
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Xerces2-J

anyURI validation is too strict

Created: 08/Apr/05 05:31 PM   Updated: 09/May/05 12:46 PM
Return to search
Component/s: XML Schema 1.0 Datatypes
Affects Version/s: 2.5.0
Fix Version/s: 2.6.0

Time Tracking:
Not Specified

Environment: Stylus Studio 6.1, Windows XP

Resolution Date: 09/May/05 12:46 PM


 Description  « Hide
The following error message is generated:

file:///d:/drm/code/mdigen/drmmdi.conf:10,80: Datatype error: Type:InvalidDatatypeValueException, Message:Value 'dcp.tcp.pft://192.168.0.1:1002:3002?fec=1&crc=0' is NOT a valid URI .

but this is a valid URI according to RFC 2936 and RFC 3986. The example is from ETSI TS 102 821 Annex C.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
elharo added a comment - 08/Apr/05 09:32 PM
Xerces is correct. This URI is syntactically incorrect according to RFC 3986. The authority component cannot have two colons when used with an IPv4 literal address. In essence, this URI tries to have two ports. I'm not familiar with the spoec you reference, but it does not appear to be conformant to the URI specification.

Michael Glavassevich added a comment - 09/Apr/05 12:46 AM
The value space for anyURI [1] is defined by RFC 2396 (and RFC 2732). dcp.tcp.pft://192.168.0.1:1002:3002?fec=1&crc=0 is allowed by the grammar since "192.168.0.1:1002:3002" matches reg_name. Registry-based Naming Authority (reg_name) has been supported since Xerces 2.6.0.

authority = server | reg_name
reg_name = 1*( unreserved | escaped | "$" | "," |
                ";" | ":" | "@" | "&" | "=" | "+" )
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" |
                "(" | ")"

If RFC 3986 prohibits this URI then it seems the new RFC is not backwards compatible with RFC 2396.

[1] http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#anyURI

elharo added a comment - 09/Apr/05 01:09 AM
RFC 3986 does seem to prohibit this URI. In 3986 we have:

   reg-name = *( unreserved / pct-encoded / sub-delims )
   unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
   sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

In 3986 the colon is a general delimiter, not a sub delimiter.

I'm not sure what should be done here. Schemas Part 2 normatively references 2396, not 3986; so I suppose this should be allowed. On the other hand, I can't help but think that this is really a bug in the definition of reg-names in 2396.

elharo added a comment - 09/Apr/05 01:15 AM
The colon isn't the only issue. The @ sign is also prohibited in reg-names in 3986 and allowed in 2396. I wonder what the working group was thinking? I suspect they were trying to make it easier to distinguish reg-names from host-based authorities, and allow user info and port to be specified for registry based authorities.

This particular issue is not listed in Appendix D2 of 3986, Modifications, so I wonder if the working group noticed it?

elharo added a comment - 09/Apr/05 09:45 PM
Roy Fielding has confirmed that this was a deliberate decision, and is indeed an incompatibility between 2396 and 3986. According to him, "No URI schemes were defined using the reg_name syntax of 2396, and therefore it was removed." Probably, nobody should be using such syntax now.

What to do now? This is a tough call, but I tend to fall back on the letter of the law (or the spec). The schemas spec references 2396, not 3986. Therefore Xerces should be changed to allow this syntax.

This might change in schema 1.1 though, which will likely reference 3986, not 2396. However, the current working draft still references RFC 2396. I've asked the schema working group to consider this issue.

Michael Glavassevich added a comment - 12/Apr/05 03:16 AM
Just to clarify... The report was opened against Xerces 2.5.0. Xerces has allowed the reg_name syntax since Xerces 2.6.0, so as of today the schema validator will accept dcp.tcp.pft://192.168.0.1:1002:3002?fec=1&crc=0 as a valid value of type anyURI.

Julian Cable added a comment - 14/Apr/05 04:33 PM
Interesting - you guys are doing a great job.

I raised the issue against 2.5 because that is what Stylus Studio ships with.

Given the comment from Roy Fielding its a shame that the Digital Radio Mondiale team who wrote ETSI TS 102 821 didn't register their URI schema. In fact this URI schema has some practical problems and it being esentially deprecated by RFC 3986 is an argument I can use to try and get it changed. Luckily Annex C is informative, not normative.

I will try and get the ETSI spec changed and definately not incorporate the deprecated URI in the XML schema we are designing. We have a Digital Radio Mondiale meeting next week where I should be able to get this going.

I think XERCES should follow (and track) the W3C standard so the current 2.6 behaviour is correct, the 2.5 behaviour is incorrect and if W3C changes the schema spec to reference 3986 then XERCES should change with it.

As far as this bug is concerned it doesn't seem like you need to implement anything but it would be nice if the exact syntax accepted by the tool was somewhere in the user documentation (if it is then my mistake but I couldn't find it by googling).

Julian

Michael Glavassevich added a comment - 09/May/05 12:46 PM
reg_name has been accepted as valid URI syntax since Xerces 2.6.0. This may change in the future if XML Schema 1.0 moves up to the RFC 3986 syntax which excludes this production. Xerces CVS currently supports the XML Schema 1.0, 2nd edition which still references RFC 2396 for the anyURI type. The version of XML Schema 1.0 supported will be clearly marked in the documentation. The relevant RFCs for anyURI may be emphasized in a FAQ.