Uploaded image for project: 'Xerces2-J'
  1. Xerces2-J
  2. XERCESJ-1276

Improve performance of XML Schema Identity-constraint validation --- XMLSchemaValidator$ValueStoreBase.contains() is painfully slow.

    XMLWordPrintableJSON

Details

    Description

      Under certain conditions, the contains() method in XMLSchemaValidator$ValueStoreBase can cripple the performance of parsing and validation.

      I'm not sure what those conditions are, but as a guideline figure I was using JAXB2 to deserialize a 22meg XML file. Without schema validation, it took 5 seconds. With validation, it took over 3 minutes (JDK 1.5.0_10 on win32). My profiler pointed the finger squarely at that method XMLSchemaValidator.

      Suspicions were aroused further when seeing this comment in the source:

      public boolean contains() {
      // REVISIT: we can improve performance by using hash codes, instead of
      // traversing global vector that could be quite large.

      This is present in Xerces 2.6.2 contained with JDK1.5.0_10, and also in the source for 2.9.1.

      Attachments

        1. xerces-fast-unique-check.diff
          103 kB
          Antti S. Lankila
        2. xerces-binaries-patched-over-2.11.0.zip
          1.38 MB
          Deepak Kumar
        3. xerces-value-store.txt
          18 kB
          Chris Simmons
        4. Xerces-J-src.2.11.0_patch1276.txt
          18 kB
          Jens Dittrich
        5. XMLSchemaValidator.java
          192 kB
          Natan Cox

        Activity

          People

            Unassigned Unassigned
            skaffman Kenny MacLeod
            Votes:
            5 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: