Bug 44517

Summary: web-app_2_4.xsd not up-to-date in TC6 servlet-api.jar
Product: Tomcat 6 Reporter: Darryl Miles <darryl>
Component: Servlet & JSP APIAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: thorbjoern
Priority: P2    
Version: 6.0.14   
Target Milestone: default   
Hardware: PC   
OS: Linux   

Description Darryl Miles 2008-03-01 17:05:16 UTC
Maybe 18 months or more ago the web-app_2_4.xsd published by Sun was updated to correct the regex used to validate MIME types.  

When a validating XML parser is used and the old regex is used then errors result for trying to validate "text/xhtml+xml" due to the plus "+" character not being allowed by the old regex.

<mime-mapping>
	<extension>xhtml</extension>
	<mime-type>application/xhtml+xml</mime-type>
</mime-mapping>


I would also motion that the Sun copyright notice should be retained in the file (and question if the Apache Foundation notice should appear at all, since claiming copyright on someone elses body of work doesn't seem legal ??? IANAL).  There is also an original Sun version number shown near the top of the original XSD and would like to see this echo'ed in the Tomcat shipped version to help manage future updates to XSDs.

In short the fix I am requesting is the following change:


--- javax/servlet/resources/web-app_2_4.xsd     2008-03-02 00:49:56.000000000 +0000
+++ ../web-app_2_4.xsd  2007-12-18 09:05:06.000000000 +0000
@@ -805,7 +798,7 @@

     <xsd:simpleContent>
       <xsd:restriction base="j2ee:string">
-       <xsd:pattern value="[\p{L}\-\p{Nd}]+/[\p{L}\-\p{Nd}\.]+"/>
+       <xsd:pattern value="[^\p{Cc}^\s]+/[^\p{Cc}^\s]+"/>
       </xsd:restriction>
     </xsd:simpleContent>
   </xsd:complexType>




I have checked Tomcat 5.5.x and that has already been updated (I believe I was involved in raising a bugzilla entry for it at the time, 18 months or so ago).

I have checked Tomcat 6.0.x with regards to web-app_2_5.xsd and I can see that the one shipped with 6.0.14 is based on Sun's original XSD version 1.62 but now 1.68 is out.  Maybe while you are at it the XSD for 2.5 could be brought up-to-date:

-            @(#)web-app_2_5.xsds1.62 05/08/06
+      @(#)web-app_2_5.xsds     1.68 07/03/09



The originals are available here:

http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd

http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd
Comment 1 Mark Thomas 2008-03-03 10:31:26 UTC
The MIME type regexp is fixed in 6.0.16 onwards.

See the dev archives for a discussion of the legal issues and why we can't/don't distribute the original Sun DTDs but ones derived from the specs.

As far as I am aware the DTDs distributed with the latest stable versions of 6.0.x, 5.5.x and 4.1.x are compliant with the latest versions of the relevant specs. If you find this is not the case, please raise a new bug against the relevant version.
Comment 2 Darryl Miles 2008-03-08 19:39:17 UTC
What I'm questioning is that I don't think a court of law would consider Apache Tomcat's shipped XSD to be "derived" from specification, when its clear from a 'diff' that letter for letter, order for order what is shipped is the original SUN (minus their Copyright notice) which is blatant plagiarism and what Copyright law seeks to protect.  I guess the Apache legal team either does not know about this specific case of this or is happy to take the risk (I myself doubt Sun would put up any fight but its the removal of their Copyright notice that beef's me off).

Maybe Sun and the JSR/JCP system should have an independent entity for holding all IP and Copyright on such things to allow for simple copying into projects such as Apache Tomcat.


Also when you say derived from the specification can you confirm exactly which specification and JSR erratum includes a textual description (that is not itself described/explained in the form of the Sun copyrighted XSD/DTD) that updates the mime-type regex.  i.e. which JSR erratum did you respond to when updating the XSD ?


Thanks for confirming 6.0.16 has been refreshed.
Comment 3 Mark Thomas 2008-03-09 04:36:22 UTC
I would be grateful if you did a little more research before throwing around unfounded accusations of plagiarism. The tone of your first paragraph does little to encourage a constructive response.

The original Sun files contain wording that is incompatible with distribution under the Apache License v2 (AL2). These files should not have been checked into svn. However, some were and have since been fixed. The svn logs explain why the removal the text is OK.

There is an added complication that the DTDs exist in a number of places in the repo and I wasn't consistent in which location I first fixed the files before copying them to the other locations. This means you have to check the svn log carefully for each file to determine its history.

In summary, three options were considered for ensuring the files can be legally distributed with Tomcat:

Option A. Where the original file had been contributed by a Sun employee, with Sun's full knowledge, whilst working under a Contributor License Agreement (CLA) and possibly a Corporate CLA (see http://www.apache.org/licenses/) that file should have had the restrictive text removed and the standard AL2 text added before it was first committed to an Apache repository. Where the text had not been removed before first commit, we checked with the original committer that the contribution was intended to have been made under their CLA and where it was modified the file's header to reflect this.

Option B. Sun has licensed most (possibly all I didn't check since we didn't go down this route) of the relevant files under CDDL. It was our preference to distribute files under AL2 so since AL2 versions of all of the files were available, we did not explore this in detail.

Option C. Geronimo went to some pains to generate AL2 licensed, spec-derived  versions of some of the DTDs. I know the ASF legal folks were involved in this but you'd need to talk to the Geronimo folks to get the details since Tomcat wasn't involved in the process, we just used the output. When we last went around this buoy the Geroninmo folks confirmed that these files were OK.

Most of the files could be dealt with under option A. When this option wasn't available, we went with option C.

The errata were dealt with the same way.

I am not going to get into how the JCP should work. Google for the JCP / ASF history on that one.

jcp.org seems to be inaccessible for me at the minute so I can't give you the URL. I don't recall if the changed was sourced from the errata, geronimo or somewhere else. The svn logs should add some illumination.
Comment 4 Curt Arnold 2009-01-14 11:24:08 UTC
*** Bug 45375 has been marked as a duplicate of this bug. ***