Uploaded image for project: 'Xerces2-J'
  1. Xerces2-J
  2. XERCESJ-1032

Bizarre interaction between choice restrictions and substitutionGroups

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.6.2
    • 2.9.0
    • None
    • java 1.4 on linux

    Description

      You've probably seen this one before, but there's a really wacky interaction between the way substitutionGroups work and how choice restrictions are specified. The particular oddness that I'm looking at results from the combination of three rules from the schema spec:

      1) substitutionGroups are supposed to be expanded into choices prior to checking the validity of a restriction
      2) When validating that a choice particle is a valid restriction of another choice particle, MapAndSum specifies an order-preserving mapping between the particles of the base and the restriction
      3) The order of the particles in the generated choice for a substitutionGroup is nowhere specified.

      As a consequence, if you have an element Er in a restricting type R of a base type B, corresponding to an element Eb in the base type, then you run into difficulties if Er is in the substitutionGroup of Eb and there is at least one element in the substitutionGroup of Er. (See the attached schema for an example) Basically what you end up with is two choice groups of undefined ordering, one of which is supposed to be a restriction of the other.

      Currently, the ordering that Xerces applies is a haphazard product of the way that loop iterations are conducted in the SubstitutionGroupHandler and XSContraints; as a result, schemas of this nature are almost always marked as invalid due to a MapAndSum error.

      Given that the validity of such schemas is left open by the schema spec due to its silence on the ordering of the generated choices, and the fact that the schemas in question are "obviously valid" to a human reader, I would like to propose a patch that ensures that substitutionGroups always have a defined ordering (I picked an ordering on namespace, then localname, but it isn't really important provided that it's consistent). The patch means that Xerces reports schemas of the form described above as valid; given that all substitutionGroups are now consistently ordered, I don't see that this could have any negative side-effects from the point of view of validation. The only potential downsides are the performance of using a TreeSet rather than a Vector (I have done some very unscientific testing on some pretty large schemas with very heavy use of substituionGroups, and I couldn't see any measurable difference) and the fact that it obviously changes the ordering reported for the elements in a substitutionGroup by the XSModel (but I don't see this as particularly significant since I believe the ordering would already have been non-deterministic for subGroups that spanned multiple grammars - the grammar bucket is based around a HashTable)

      Attachments

        1. patch.txt
          3 kB
          Lucian Holland
        2. patch.txt
          8 kB
          Lucian Holland
        3. test1.xsd
          0.6 kB
          Lucian Holland

        Activity

          People

            sandygao@ca.ibm.com Sandy Gao
            lefh@decisionsoft.com Lucian Holland
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: