Xerces2-J
  1. Xerces2-J
  2. XERCESJ-1130

Cannot validate against multiple XML schemas within the same namespace

    Details

      Description

      The javax.xml.validation.SchemaFactory.newSchema method can take an array of schema sources and construct a composite schema. Unfortunately, this does not work if the schema are in the same namespace. When the schema are in the same namespace, only the first schema is used.

      The bug is caused by the computation of the hashcode for a grammar in the grammar cache. The hashcode is computed based only on the namespace of a grammar (i.e. schema file). Thus if multiple schemas are specified from the same namespace, they will all hash to the same code and only the first will be cached and returned for all subsequent caching attempts.

      If each schema is in a separate namespace, the newSchema method works properly. Effectively, the assumption that was made in the code was that there will only be a single schema per namespace. This is an incorrect assumption.

      One of the more major implications of this bug is that the XML Schema "include" element can never work since by definition, included schemas must be in the same namespace. Because of this bug, inclusion can only be done using the "import" element since it supports different namespaces.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1. Create a sample XML file called test.xml
      2. Create a schema for the test file but split it between two schema files: schema1.xsd and schema2.xsd
      3. Create a java class that uses SAX (DOM shows the same problem) to parse the file and extends defaultHandler.
      4. Call the SchemaFactory.newSchema method with a source array containing schema1.xsd and schema2.xsd.
      5. Set the resulting schema on the parser
      6. Implement an error method in the class
      7. Run the program and observe that there is a validation failure. This is becuase the second schema in the array was never processed and so the validator does not accept the test XML file.

      The source code, test XML file, and schemas are included in this bug report.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The test XML file should have validated properly against the composite schema.
      ACTUAL -
      There was a validation failure. This is becuase the second schema in the array was never processed and so the validator does not accept the test XML file.

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      SAX Error: http://www.w3.org/TR/xml-schema-1#cvc-elt.1?root

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      ------------- schemaTest.java ----------------------------

      import org.xml.sax.Attributes;
      import org.xml.sax.SAXException;
      import org.xml.sax.SAXParseException;
      import org.xml.sax.helpers.DefaultHandler;

      public class SchemaTest extends DefaultHandler {

      public static void main(String[] args)
      {
      try

      { FileInputStream is = new FileInputStream("test.xml"); StreamSource[] sources = new StreamSource[2]; FileInputStream ss = new FileInputStream("schema2.xsd"); sources[0] = new StreamSource(ss); ss = new FileInputStream("schema1.xsd"); sources[1] = new StreamSource(ss); SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI); Schema schema = schemaFactory.newSchema(sources); SAXParserFactory saxFactory = SAXParserFactory.newInstance(); saxFactory.setNamespaceAware(true); saxFactory.setValidating(false); saxFactory.setXIncludeAware(true); saxFactory.setSchema(schema); SAXParser parser = saxFactory.newSAXParser(); parser.parse(is, new SchemaTest()); }

      catch (FileNotFoundException e)

      { e.printStackTrace(); }
      catch (SAXException e) { e.printStackTrace(); }

      catch (ParserConfigurationException e)

      { e.printStackTrace(); }
      catch (IOException e) { e.printStackTrace(); }

      }

      /**

      • @see org.xml.sax.ErrorHandler#warning(org.xml.sax.SAXParseException)
        */
        public void warning(SAXParseException e) throws SAXException { System.err.println("SAX Warning: " + e.getLocalizedMessage()); } /** * @see org.xml.sax.ErrorHandler#error(org.xml.sax.SAXParseException) */ public void error(SAXParseException e) throws SAXException { System.err.println("SAX Error: " + e.getLocalizedMessage()); } /** * @see org.xml.sax.ErrorHandler#fatalError(org.xml.sax.SAXParseException) */ public void fatalError(SAXParseException e) throws SAXException { System.err.println("SAX Fatal Error: " + e.getLocalizedMessage()); } /** * @see org.xml.sax.ContentHandler#startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes) */ public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { System.out.println("Start element: " + localName); } /** * @see org.xml.sax.ContentHandler#endElement(java.lang.String, java.lang.String, java.lang.String) */ public void endElement(String uri, String localName, String qName) throws SAXException { System.out.println("End element: " + localName); } }

        // End of class SchemaTest

      -------------------- test.xml -------------------------

      <?xml version="1.0"?>
      <root>
      <elem1 value="abc"/>
      </root>

      ---------------- schema1.xsd ------------------------

      <?xml version="1.0" encoding="UTF-8"?>
      <xs:schema elementFormDefault="qualified" xml:lang="EN" xmlns:xs="http://www.w3.org/2001/XMLSchema">

      <xs:include schemaLocation="schema2.xsd"/>

      <xs:element name="root">
      <xs:complexType>
      <xs:all>
      <xs:element ref="elem1" minOccurs="0"/>
      </xs:all>
      </xs:complexType>
      </xs:element>
      <xs:element name="elem1">
      <xs:complexType>
      <xs:attribute name="value" type="ValueType" use="required"/>
      </xs:complexType>
      </xs:element>
      </xs:schema>

      --------------------- schema2.xsd ------------------------------

      <?xml version="1.0" encoding="UTF-8"?>
      <xs:schema elementFormDefault="qualified" xml:lang="EN" xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <xs:simpleType name="ValueType">
      <xs:restriction base="xs:string">
      <xs:enumeration value="abc"/>
      <xs:enumeration value="def"/>
      </xs:restriction>
      </xs:simpleType>
      </xs:schema>

      ---------- END SOURCE ----------

      1. MyEntityResolver.java
        1 kB
        Shakya Wijerama
      2. SAXContentHandler.java
        2 kB
        Shakya Wijerama
      3. source_v0.2.tar.gz
        7 kB
        Shakya Wijerama
      4. XMLSchemaFactory.java
        30 kB
        Shakya Wijerama

        Activity

        Sunitha created issue -
        Hide
        Henrik Segesten added a comment -

        Hi

        No activity for almost three years
        Anyone has any news of it?

        Greetings
        Henrik

        Show
        Henrik Segesten added a comment - Hi No activity for almost three years Anyone has any news of it? Greetings Henrik
        Hide
        JJC added a comment -

        Is this problem going to be fixed?

        Is there a workaround for this?

        Show
        JJC added a comment - Is this problem going to be fixed? Is there a workaround for this?
        Mark Thomas made changes -
        Field Original Value New Value
        Workflow jira [ 12345917 ] Default workflow, editable Closed status [ 12575580 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12575580 ] jira [ 12598681 ]
        Hide
        Martin Vanek added a comment -

        2.11.0 and it still does not work

        Show
        Martin Vanek added a comment - 2.11.0 and it still does not work
        Hide
        Michael Glavassevich added a comment -

        Patches are welcome.

        Show
        Michael Glavassevich added a comment - Patches are welcome.
        Hide
        Mukul Gandhi added a comment - - edited

        here's some quick analysis related to this bug report,

        ref, http://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/parsers/SAXParserFactory.html says

        setSchema - since 1.5

        setXIncludeAware - since 1.5

        Therefore while in java 1.5 environment,

        changing source code fragment cited in this bug report to:

        StreamSource[] sources = new StreamSource[1];
        FileInputStream ss = new FileInputStream("schema1.xsd");
        sources[0] = new StreamSource(ss);

        instead of the two element sources array. The one element sources array should be ok here, since schema2.xsd would be resolved by means of loading only schema1.xsd. This works for me.

        In JDK 1.4 environment, Validator.validate(..) should be fine for validation, for the schemas presented in this test case, and again one should only use the one element array in JDK 1.4 environment too.

        Therefore this doesn't look like a bug to me (at least with Xerces 2.11.0).

        Show
        Mukul Gandhi added a comment - - edited here's some quick analysis related to this bug report, ref, http://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/parsers/SAXParserFactory.html says setSchema - since 1.5 setXIncludeAware - since 1.5 Therefore while in java 1.5 environment, changing source code fragment cited in this bug report to: StreamSource[] sources = new StreamSource [1] ; FileInputStream ss = new FileInputStream("schema1.xsd"); sources [0] = new StreamSource(ss); instead of the two element sources array. The one element sources array should be ok here, since schema2.xsd would be resolved by means of loading only schema1.xsd. This works for me. In JDK 1.4 environment, Validator.validate(..) should be fine for validation, for the schemas presented in this test case, and again one should only use the one element array in JDK 1.4 environment too. Therefore this doesn't look like a bug to me (at least with Xerces 2.11.0).
        Michael Glavassevich made changes -
        Labels gsoc gsoc2012
        Hide
        Michael Glavassevich added a comment -

        From a posting I made to j-dev@xerces.apache.org on 02/23/2012:

        I understand what Mukul is saying, but disagree that Xerces is working correctly. He described a work around for that specific case, not a general solution.

        I believe users should be able to provide an array of schema documents to SchemaFactory.newInstance() which all have the same namespace and that Xerces should be fixed so that it can process that.

        It should be possible for the SchemaFactory implementation to internally generate a synthetic schema document which combines the user's list of schema documents together. From the schema loader's perspective this would look like one master schema document for the namespace which has includes to all the others. Taking that a step further, another schema document could be generated which glues all the namespaces together with imports.

        Show
        Michael Glavassevich added a comment - From a posting I made to j-dev@xerces.apache.org on 02/23/2012: I understand what Mukul is saying, but disagree that Xerces is working correctly. He described a work around for that specific case, not a general solution. I believe users should be able to provide an array of schema documents to SchemaFactory.newInstance() which all have the same namespace and that Xerces should be fixed so that it can process that. It should be possible for the SchemaFactory implementation to internally generate a synthetic schema document which combines the user's list of schema documents together. From the schema loader's perspective this would look like one master schema document for the namespace which has includes to all the others. Taking that a step further, another schema document could be generated which glues all the namespaces together with imports.
        Hide
        Michael Glavassevich added a comment -

        Also, this should behave the same regardless of the order of the schema documents in the array passed to SchemaFactory.newSchema().

        Show
        Michael Glavassevich added a comment - Also, this should behave the same regardless of the order of the schema documents in the array passed to SchemaFactory.newSchema().
        Michael Glavassevich made changes -
        Labels gsoc gsoc2012 gsoc gsoc2012 mentor
        Hide
        Shakya Wijerama added a comment -

        Hello Michael,

        The implementation class for the javax.xml.validation.SchemaFactory is XMLSchemaFactory in Xerces implementation. In the newSchema( Source[] schemas ) method, it takes an array of schemas. When the schema loader loads the grammar, it puts the grammar (SchemaGrammar) into a grammar bucket where it stores the grammars in a hashtable (fGrammarRegistry). But, when putting the grammar object into the fGrammarRegistry what Xerces does is to get the namespace of the grammar as the "key". When the array for the newSchema method has more than one schema which has same namespace fGrammarRegistry does not contain multiple grammar objects even for different schemas.

        If all the schemas which are passed to the newSchema method have the same namespace, there will be only one object in fGrammarRegistry(probably the last grammar in the list of schemas since the previous value is overriden by the new value for the same key). Then, only one grammar would be cached in the grammar cache. At the end, the count in the grammarPool becomes "one" and a SimpleXMLSchema will be returned from the newSchema() method.

        I feel that this is the issue for this bug. So please correct me if there is any correction. I am looking for the solution of creating a synthetic schema which you have mentioned in your last post.

        Thanks,
        Shakya.

        Show
        Shakya Wijerama added a comment - Hello Michael, The implementation class for the javax.xml.validation.SchemaFactory is XMLSchemaFactory in Xerces implementation. In the newSchema( Source[] schemas ) method, it takes an array of schemas. When the schema loader loads the grammar, it puts the grammar (SchemaGrammar) into a grammar bucket where it stores the grammars in a hashtable (fGrammarRegistry). But, when putting the grammar object into the fGrammarRegistry what Xerces does is to get the namespace of the grammar as the "key". When the array for the newSchema method has more than one schema which has same namespace fGrammarRegistry does not contain multiple grammar objects even for different schemas. If all the schemas which are passed to the newSchema method have the same namespace, there will be only one object in fGrammarRegistry(probably the last grammar in the list of schemas since the previous value is overriden by the new value for the same key). Then, only one grammar would be cached in the grammar cache. At the end, the count in the grammarPool becomes "one" and a SimpleXMLSchema will be returned from the newSchema() method. I feel that this is the issue for this bug. So please correct me if there is any correction. I am looking for the solution of creating a synthetic schema which you have mentioned in your last post. Thanks, Shakya.
        Hide
        Michael Glavassevich added a comment -

        Hi Shakya,

        It's by design that there's only one SchemaGrammar object per namespace. The schema loader understands how to traverse a graph of schema documents, starting from a top-level schema with imports and includes. It doesn't know how to merge the results very well from independent loading requests and wasn't really designed to do that. I like to think of that more as a design limitation than a bug in the schema loader. Creating a synthetic schema from the Source[] would work around the limitation by presenting the schema loader with a single graph/tree of schema documents.

        Thanks,
        Michael

        Show
        Michael Glavassevich added a comment - Hi Shakya, It's by design that there's only one SchemaGrammar object per namespace. The schema loader understands how to traverse a graph of schema documents, starting from a top-level schema with imports and includes. It doesn't know how to merge the results very well from independent loading requests and wasn't really designed to do that. I like to think of that more as a design limitation than a bug in the schema loader. Creating a synthetic schema from the Source[] would work around the limitation by presenting the schema loader with a single graph/tree of schema documents. Thanks, Michael
        Hide
        Shakya Wijerama added a comment -

        Hello Michael,

        By a synthetic schema document, do you mean merging the tree/graph of the schema documents to a single schema document?

        For example, when we merge the two schema documents from Sunitha's example, it is like this.

        <?xml version="1.0" encoding="UTF-8"?>
        <xs:schema elementFormDefault="qualified" xml:lang="EN" xmlns:xs="http://www.w3.org/2001/XMLSchema">

        <!-- <xs:include schemaLocation="schema2.xsd"/> -->

        <xs:element name="root">
        <xs:complexType>
        <xs:all>
        <xs:element ref="elem1" minOccurs="0"/>
        </xs:all>
        </xs:complexType>
        </xs:element>
        <xs:element name="elem1">
        <xs:complexType>
        <xs:attribute name="value" type="ValueType" use="required"/>
        </xs:complexType>
        </xs:element>

        <xs:simpleType name="ValueType">
        <xs:restriction base="xs:string">
        <xs:enumeration value="abc"/>
        <xs:enumeration value="def"/>
        </xs:restriction>
        </xs:simpleType>
        </xs:schema>

        Now, we can hand this over to the SchemaLoader as a single schema document.

        Should we generate a synthetic document only when there are more than one schema documents with same namespace in source[]? I will look into a more detailed design and post it here.

        Thanks,
        Shakya.

        Show
        Shakya Wijerama added a comment - Hello Michael, By a synthetic schema document, do you mean merging the tree/graph of the schema documents to a single schema document? For example, when we merge the two schema documents from Sunitha's example, it is like this. <?xml version="1.0" encoding="UTF-8"?> <xs:schema elementFormDefault="qualified" xml:lang="EN" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <!-- <xs:include schemaLocation="schema2.xsd"/> --> <xs:element name="root"> <xs:complexType> <xs:all> <xs:element ref="elem1" minOccurs="0"/> </xs:all> </xs:complexType> </xs:element> <xs:element name="elem1"> <xs:complexType> <xs:attribute name="value" type="ValueType" use="required"/> </xs:complexType> </xs:element> <xs:simpleType name="ValueType"> <xs:restriction base="xs:string"> <xs:enumeration value="abc"/> <xs:enumeration value="def"/> </xs:restriction> </xs:simpleType> </xs:schema> Now, we can hand this over to the SchemaLoader as a single schema document. Should we generate a synthetic document only when there are more than one schema documents with same namespace in source[]? I will look into a more detailed design and post it here. Thanks, Shakya.
        Hide
        Michael Glavassevich added a comment -

        Hello Shakya,

        I meant generating a schema document more like:

        <?xml version="1.0" encoding="UTF-8"?>
        <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
        <xs:include schemaLocation="schema1.xsd"/>
        <xs:include schemaLocation="schema2.xsd"/>
        </xs:schema>

        and passing that to the schema loader. You would need to write a custom entity resolver, so that when the schema loader tries to fetch each of the includes it returns each of the original sources that were passed in the Source[].

        Thanks.

        Show
        Michael Glavassevich added a comment - Hello Shakya, I meant generating a schema document more like: <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:include schemaLocation="schema1.xsd"/> <xs:include schemaLocation="schema2.xsd"/> </xs:schema> and passing that to the schema loader. You would need to write a custom entity resolver, so that when the schema loader tries to fetch each of the includes it returns each of the original sources that were passed in the Source[]. Thanks.
        Michael Glavassevich made changes -
        Assignee Shakya Wijerama [ shakya.wijerama ]
        Hide
        Shakya Wijerama added a comment -

        Hi,

        Hereby I attach the source code changes and the classes added during the project.
        Still there are some modifications to do in the code such exception handling, etc.

        This is the list of files changed:

        1. XMLSchemaFactory.java

        private String getNameSpace(XMLInputSource is) - added this method
        private XMLInputSource generateDummySchema(XMLInputSource[] xmlInputSources) - added this method
        public Schema newSchema( Source[] schemas ) throws SAXException - modified this method

        2. SAXContentHandler (newly added class file)

        3. MyEntityResolver (newly added class file - the class name should be refactored)

        I will update with the final code soon.

        Thanks.

        Show
        Shakya Wijerama added a comment - Hi, Hereby I attach the source code changes and the classes added during the project. Still there are some modifications to do in the code such exception handling, etc. This is the list of files changed: 1. XMLSchemaFactory.java private String getNameSpace(XMLInputSource is) - added this method private XMLInputSource generateDummySchema(XMLInputSource[] xmlInputSources) - added this method public Schema newSchema( Source[] schemas ) throws SAXException - modified this method 2. SAXContentHandler (newly added class file) 3. MyEntityResolver (newly added class file - the class name should be refactored) I will update with the final code soon. Thanks.
        Shakya Wijerama made changes -
        Attachment MyEntityResolver.java [ 12541527 ]
        Attachment SAXContentHandler.java [ 12541528 ]
        Attachment XMLSchemaFactory.java [ 12541529 ]
        Shakya Wijerama made changes -
        Attachment source_v0.2.tar.gz [ 12541700 ]
        Hide
        Shakya Wijerama added a comment -

        Hi Michael and devs,

        I attached the latest code for the project and hope your feedback on it for further improvements.

        Thanks.

        Show
        Shakya Wijerama added a comment - Hi Michael and devs, I attached the latest code for the project and hope your feedback on it for further improvements. Thanks.
        Shakya Wijerama made changes -
        Attachment source_v0.2.tar.gz [ 12541701 ]
        Shakya Wijerama made changes -
        Attachment source_v0.2.tar.gz [ 12541700 ]

          People

          • Assignee:
            Shakya Wijerama
            Reporter:
            Sunitha
          • Votes:
            9 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:

              Development