Uploaded image for project: 'Xerces2-J'
  1. Xerces2-J
  2. XERCESJ-1276

Improve performance of XML Schema Identity-constraint validation --- XMLSchemaValidator$ValueStoreBase.contains() is painfully slow.

    XMLWordPrintableJSON

    Details

      Description

      Under certain conditions, the contains() method in XMLSchemaValidator$ValueStoreBase can cripple the performance of parsing and validation.

      I'm not sure what those conditions are, but as a guideline figure I was using JAXB2 to deserialize a 22meg XML file. Without schema validation, it took 5 seconds. With validation, it took over 3 minutes (JDK 1.5.0_10 on win32). My profiler pointed the finger squarely at that method XMLSchemaValidator.

      Suspicions were aroused further when seeing this comment in the source:

      public boolean contains() {
      // REVISIT: we can improve performance by using hash codes, instead of
      // traversing global vector that could be quite large.

      This is present in Xerces 2.6.2 contained with JDK1.5.0_10, and also in the source for 2.9.1.

        Attachments

        1. XMLSchemaValidator.java
          192 kB
          Natan Cox
        2. Xerces-J-src.2.11.0_patch1276.txt
          18 kB
          Jens Dittrich
        3. xerces-value-store.txt
          18 kB
          Chris Simmons
        4. xerces-binaries-patched-over-2.11.0.zip
          1.38 MB
          Deepak Kumar
        5. xerces-fast-unique-check.diff
          103 kB
          Antti S. Lankila

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              skaffman Kenny MacLeod
            • Votes:
              5 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated: