Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-1363

DataTypeListValidator extraordinarily slow for long lists

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.5.0, 2.6.0
    • 2.7.0
    • None
    • Windows 2000

    Description

      Validating an XML instance against a Schema with an unbounded xsd:list type can take much greater than O processing resources, where n is the number of items in the list.

      To reproduce use this Schema:

      pq.xsd

      <?xml version="1.0" encoding="utf-8" ?>
      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:pqns="http://swsis.cambridge.arm.com/~dearlam/xercestest/" targetNamespace="http://swsis.cambridge.arm.com/~dearlam/xercestest/"
      elementFormDefault="qualified" version="0.1">
      <xs:annotation>
      <xs:documentation xml:lang="en">
      XML schema for Hofstadter's Gödel pq-System.

      Test data for list data type validation.
      </xs:documentation>
      </xs:annotation>
      <xs:element name="pqData" type="pqns:pqDataType"></xs:element>
      <xs:complexType name="pqDataType">
      <xs:complexContent>
      <xs:restriction base="xs:anyType">
      <xs:sequence minOccurs="1" maxOccurs="1">
      <xs:element name="dashes" type="pqns:dashBlockType"></xs:element>
      <xs:element name="p" type="xs:string" xsi:nill="true"></xs:element>
      <xs:element name="dashes" type="pqns:dashBlockType"></xs:element>
      <xs:element name="q" type="xs:string" xsi:nill="true"></xs:element>
      <xs:element name="dashes" type="pqns:dashBlockType"></xs:element>
      </xs:sequence>
      </xs:restriction>
      </xs:complexContent>
      </xs:complexType>
      <xs:complexType name="porqType">
      <xs:simpleContent>
      <xs:extension base="xs:string"></xs:extension>
      </xs:simpleContent>
      </xs:complexType>
      <xs:complexType name="dashBlockType">
      <xs:simpleContent>
      <xs:extension base="pqns:dataDashes"></xs:extension>
      </xs:simpleContent>
      </xs:complexType>
      <xs:simpleType name="Dash">
      <xs:restriction base="xs:string">
      <xs:pattern value="[\-]"></xs:pattern>
      </xs:restriction>
      </xs:simpleType>
      <xs:simpleType name="dataDashes">
      <xs:restriction base="pqns:DashList">
      <xs:minLength value="0" />
      </xs:restriction>
      </xs:simpleType>
      <xs:simpleType name="DashList">
      <xs:list itemType="pqns:Dash"></xs:list>
      </xs:simpleType>
      </xs:schema>

      and this XML file

      pqData0.xml

      <?xml version="1.0" encoding="utf-8" ?>
      <pqData xmlns='http://swsis.cambridge.arm.com/~dearlam/xercestest/'
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://swsis.cambridge.arm.com/~dearlam/xercestest/
      http://swsis.cambridge.arm.com/~dearlam/xercestest/pq.xsd">
      <dashes>

      • -
        </dashes>
        <p/>
        <dashes>-</dashes>
        <q/>
        <dashes>-</dashes>
        </pqData>

      (replacing swsis.cambridge.arm.com/~dearlam/xercestest with your location)

      Then use
      domprint -wfpp=on pqData0.xml
      and
      domprint -n -s -wfpp=on pqData0.xml
      to print the XML non-validating and validating.

      They print in equal short time. OK.

      Now, edit pqData0.xml as pqData1.xml and replace

      • -
        with 4000 lines of
      • - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      This gives a 500Kb file (which mimics my real data).

      If you then try

      domprint -wfpp=on pqData1.xml
      and
      domprint -n -s -wfpp=on pqData1.xml
      the first prints instantly (pipe it to NUL if you like), but the second consumes 99% CPU for 230 seconds, then prints.

      That's about 2 bytes per second !


      (My suspicion is XMLString::tokenizeString is using subString() to calculate the string length
      way too many times...)

      kind regards,
      David

      Attachments

        1. XMLString.cpp.patch
          1 kB
          Christian Will
        2. second_patch_XMLString.cpp.zip
          2 kB
          Christian Will
        3. pq.zip
          5 kB
          David Earlam
        4. BaseRefVectorOf.c.patch
          0.9 kB
          David Earlam

        Activity

          People

            Unassigned Unassigned
            dearlam David Earlam
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: