Solr
  1. Solr
  2. SOLR-1669

Make XMLWriter write out xml that references the SOLR response XSD/DTD

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.2, 1.3, 1.4
    • Fix Version/s: 4.9, 5.0
    • Component/s: Response Writers
    • Labels:
      None
    • Environment:

      My MacBook Pro, Christmas-break style.

      Description

      As described in SOLR-17, this is patch #2 of 3. It will make a simple modification to XMLWriter which makes the output response XML reference the SOLR XSD file. This way, clients can validate against it.

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Chris A. Mattmann added a comment -

          Hey Hoss:

          1. that doesn't sound right ... i thought Schema declaration URIs were like Namespace URIs - they were just a unique name, that the XML parser would then be configured with a mapping to a local file (or remote URL) for the parser to use? ... it's suppose to be a real URL?!?!?!

          Schema declaration URIs are like namespace URIs, however, you can provide a schemaLocation that is a referenced URL, then the XML parser can decide (as one of its strategies) to pull it down. I found this on the XML Schema standards site at the W3C:

          http://www.w3.org/TR/xmlschema-1/#schema-loc

          Here's the relevant part pasted in:

          Schema Representation Constraint: Schema Document Location Strategy
          Given a namespace name (or none) and (optionally) a URI reference from xsi:schemaLocation or xsi:noNamespaceSchemaLocation, schema-aware processors may implement any combination of the following strategies, in any order:
          1 Do nothing, for instance because a schema containing components for the given namespace name is already known to be available, or because it is known in advance that no efforts to locate schema documents will be successful (for example in embedded systems);
          2 Based on the location URI, identify an existing schema document, either as a resource which is an XML document or a <schema> element information item, in some local schema repository;
          3 Based on the namespace name, identify an existing schema document, either as a resource which is an XML document or a <schema> element information item, in some local schema repository;
          4 Attempt to resolve the location URI, to locate a resource on the web which is or contains or references a <schema> element;
          5 Attempt to resolve the namespace name to locate such a resource.
          Whenever possible configuration and/or invocation options for selecting and/or ordering the implemented strategies should be provided. 
          

          However, reading this standards doc though suggests something that we can do to alleviate this. We can provide the XSD in the SOLR war file and then reference it locally as you suggest (by setting xsi:schemaLocation to it).

          2. If schema URIs really are URLs then we should absolutely NOT reference anything in *.apache.org for the schema URL ... we (and our users) don't want/need every solr client on the planet hitting the apache webservers for this kind of validation. it would make a lot more sense to include the XSD in the war and keep it all on the same host

          Good point. I'll address as part of 1.

          Show
          Chris A. Mattmann added a comment - Hey Hoss: 1. that doesn't sound right ... i thought Schema declaration URIs were like Namespace URIs - they were just a unique name, that the XML parser would then be configured with a mapping to a local file (or remote URL) for the parser to use? ... it's suppose to be a real URL?!?!?! Schema declaration URIs are like namespace URIs, however, you can provide a schemaLocation that is a referenced URL, then the XML parser can decide (as one of its strategies) to pull it down. I found this on the XML Schema standards site at the W3C: http://www.w3.org/TR/xmlschema-1/#schema-loc Here's the relevant part pasted in: Schema Representation Constraint: Schema Document Location Strategy Given a namespace name (or none) and (optionally) a URI reference from xsi:schemaLocation or xsi:noNamespaceSchemaLocation, schema-aware processors may implement any combination of the following strategies, in any order: 1 Do nothing, for instance because a schema containing components for the given namespace name is already known to be available, or because it is known in advance that no efforts to locate schema documents will be successful (for example in embedded systems); 2 Based on the location URI, identify an existing schema document, either as a resource which is an XML document or a <schema> element information item, in some local schema repository; 3 Based on the namespace name, identify an existing schema document, either as a resource which is an XML document or a <schema> element information item, in some local schema repository; 4 Attempt to resolve the location URI, to locate a resource on the web which is or contains or references a <schema> element; 5 Attempt to resolve the namespace name to locate such a resource. Whenever possible configuration and/or invocation options for selecting and/or ordering the implemented strategies should be provided. However, reading this standards doc though suggests something that we can do to alleviate this. We can provide the XSD in the SOLR war file and then reference it locally as you suggest (by setting xsi:schemaLocation to it). 2. If schema URIs really are URLs then we should absolutely NOT reference anything in *.apache.org for the schema URL ... we (and our users) don't want/need every solr client on the planet hitting the apache webservers for this kind of validation. it would make a lot more sense to include the XSD in the war and keep it all on the same host Good point. I'll address as part of 1.
          Hide
          Hoss Man added a comment -

          Yes, I mean it's a real URL. Once the patch from SOLR-17 is applied, it will become a real URL

          1. that doesn't sound right ... i thought Schema declaration URIs were like Namespace URIs – they were just a unique name, that the XML parser would then be configured with a mapping to a local file (or remote URL) for the parser to use? ... it's suppose to be a real URL?!?!?!

          2. If schema URIs really are URLs then we should absolutely NOT reference anything in *.apache.org for the schema URL ... we (and our users) don't want/need every solr client on the planet hitting the apache webservers for this kind of validation. it would make a lot more sense to include the XSD in the war and keep it all on the same host

          Show
          Hoss Man added a comment - Yes, I mean it's a real URL. Once the patch from SOLR-17 is applied, it will become a real URL 1. that doesn't sound right ... i thought Schema declaration URIs were like Namespace URIs – they were just a unique name, that the XML parser would then be configured with a mapping to a local file (or remote URL) for the parser to use? ... it's suppose to be a real URL?!?!?! 2. If schema URIs really are URLs then we should absolutely NOT reference anything in *.apache.org for the schema URL ... we (and our users) don't want/need every solr client on the planet hitting the apache webservers for this kind of validation. it would make a lot more sense to include the XSD in the war and keep it all on the same host
          Hide
          Chris A. Mattmann added a comment -

          Back compat.

          XML Parsers can be configured to do validation if/when a schema/dtd is declared. Clients unknowningly using a parser configured that way could (and most likely would) perceive a big slow down in their apps if we started including the schema declaration and that triggered expensive validation when they didn't want it.

          there are also people using the XML response format for incredibly small responses (thing an autosuggest component) where a schema delcaration would double the size of hte response

          Okey dokey – no schema by default then!

          I'll try and prepare an update to this issue and to SOLR-17 and throw up the patches today. Thanks for the feedback!

          Show
          Chris A. Mattmann added a comment - Back compat. XML Parsers can be configured to do validation if/when a schema/dtd is declared. Clients unknowningly using a parser configured that way could (and most likely would) perceive a big slow down in their apps if we started including the schema declaration and that triggered expensive validation when they didn't want it. there are also people using the XML response format for incredibly small responses (thing an autosuggest component) where a schema delcaration would double the size of hte response Okey dokey – no schema by default then! I'll try and prepare an update to this issue and to SOLR-17 and throw up the patches today. Thanks for the feedback!
          Hide
          Hoss Man added a comment -

          The default should be to not include a schema.

          Why?

          Back compat.

          XML Parsers can be configured to do validation if/when a schema/dtd is declared. Clients unknowningly using a parser configured that way could (and most likely would) perceive a big slow down in their apps if we started including the schema declaration and that triggered expensive validation when they didn't want it.

          there are also people using the XML response format for incredibly small responses (thing an autosuggest component) where a schema delcaration would double the size of hte response.

          Show
          Hoss Man added a comment - The default should be to not include a schema. Why? Back compat. XML Parsers can be configured to do validation if/when a schema/dtd is declared. Clients unknowningly using a parser configured that way could (and most likely would) perceive a big slow down in their apps if we started including the schema declaration and that triggered expensive validation when they didn't want it. there are also people using the XML response format for incredibly small responses (thing an autosuggest component) where a schema delcaration would double the size of hte response.
          Hide
          Chris A. Mattmann added a comment -

          do you mean to say that the schema location provided by the xml is a real url? it is just a unique string . it could be just about anything else.

          Yes, I mean it's a real URL. Once the patch from SOLR-17 is applied, it will become a real URL (and as part of SOLR-17 there needs to be a new site build/push out).

          The default should be to not include a schema.

          Why?

          Show
          Chris A. Mattmann added a comment - do you mean to say that the schema location provided by the xml is a real url? it is just a unique string . it could be just about anything else. Yes, I mean it's a real URL. Once the patch from SOLR-17 is applied, it will become a real URL (and as part of SOLR-17 there needs to be a new site build/push out). The default should be to not include a schema. Why?
          Hide
          Yonik Seeley added a comment -

          The default should be to not include a schema.

          Show
          Yonik Seeley added a comment - The default should be to not include a schema.
          Hide
          Noble Paul added a comment -

          do you mean to say that the schema location provided by the xml is a real url? it is just a unique string . it could be just about anything else.

          Show
          Noble Paul added a comment - do you mean to say that the schema location provided by the xml is a real url? it is just a unique string . it could be just about anything else.
          Hide
          Chris A. Mattmann added a comment -

          Clients can validate locally if they separately download the XSD but then there is no mechanism to inject it into the response XML as a reference, which is required for runtime validation. But if the XML response references the XSD, and the client is set to validate (a user's choice), there is no need to (separately) download the XSD locally to do runtime validation.

          Show
          Chris A. Mattmann added a comment - Clients can validate locally if they separately download the XSD but then there is no mechanism to inject it into the response XML as a reference, which is required for runtime validation. But if the XML response references the XSD, and the client is set to validate (a user's choice), there is no need to (separately) download the XSD locally to do runtime validation.
          Hide
          Noble Paul added a comment -

          clients can still validate it if they have the XSD. What good is the noSchema thing?

          Show
          Noble Paul added a comment - clients can still validate it if they have the XSD. What good is the noSchema thing?
          Hide
          Chris A. Mattmann added a comment -
          • patch to implement referencing of XSD by XMLWriter
          Show
          Chris A. Mattmann added a comment - patch to implement referencing of XSD by XMLWriter

            People

            • Assignee:
              Unassigned
              Reporter:
              Chris A. Mattmann
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:

                Development