Issue Details (XML | Word | Printable)

Key: XERCESC-1051
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Alberto Massari
Reporter: Frank Rast
Votes: 2
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Xerces-C++

Crash when maxOccurs >= 200000

Created: 14/Nov/03 06:32 PM   Updated: 10/Jul/09 01:13 PM
Component/s: Validating Parser (XML Schema)
Affects Version/s: 2.3.0
Fix Version/s: 3.0.0

Time Tracking:
Not Specified

Environment:
Operating System: Windows NT/2K
Platform: PC

Bugzilla Id: 24703
Resolution Date: 13/Jul/08 08:02 PM
Labels:


 Description  « Hide
Parser crashes in ContentSpecNode.hpp: ContentSpecNode::~ContentSpecNode().

Steps to reproduce:
validate a xml file against a schema with an element having a maxOccurs >=
200000.

Assumed cause:
Stack overfow

Makeshift resolution:
Set the repeat count to unbounded(-1), when maxOccurs > 500:

inline void ContentSpecNode::setMaxOccurs(int max)
{
    if(max > 500)
        max = -1;
    fMaxOccurs = max;
}

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Serge Knystautas made changes - 09/Apr/04 04:46 PM
Field Original Value New Value
issue.field.bugzillaimportkey 24703 18927
Alberto Massari made changes - 02/Nov/04 02:02 PM
Priority Major [ 3 ]
Alberto Massari made changes - 08/Apr/05 11:10 PM
Assignee Xerces-C Developers Mailing List [ xerces-c-dev@xml.apache.org ] Alberto Massari [ amassari ]
Christian Charbula added a comment - 21/Mar/07 10:56 AM
Well, that's really a big problem to me.

Since I've to implement a solution which was given from the UNO, about 30 goverments and some hundret credit instutes, it's really not possible to me to change the maxOccurs.
The maxOccurs is a very important limitation in this validation. (500 is to low by far)
It's not possible to set/overrule the maxOccurs to another value.

I need to use the Xerces-C++ parser, since it's also available for some 'exotic'-operating systems. (like OpenVMS)

Please set a higher priority to this issue.
Why is there a recursion needed? Can't it be implemented in a counting solution?
Thanks,
Christian

Frank Rast added a comment - 21/Mar/07 11:41 AM
I would also like to see a higher priority.

Can someone estimate when this issue will be solved?
I would prefer a counting solution too, but everything else is also better than a crash.

Any thougths?

Regards,
Frank

David Bertoni added a comment - 21/Mar/07 04:04 PM
Well, since this is an open source project, either of you could contribute. Perhaps you could investigate and work on the it?

Igor Sakhnov added a comment - 07/Jun/07 01:02 AM
Change maxOccurs does not help, try following schema and data:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="season">
    <xs:complexType>
        <xs:sequence minOccurs="100" maxOccurs="300">
           <xs:element name="spring" minOccurs="10" maxOccurs="99"/>
           <xs:element name="summer" minOccurs="10" maxOccurs="99"/>
        </xs:sequence>
    </xs:complexType>
    </xs:element>
</xs:schema>

<season>
        <spring>cool</spring>
        <summer>warm</summer>
</season>

Any suggestion on what can be tweaked? And yes, our developers submitted varios fixes for different items and as far as I know there was no feedback :)

Karsten Strunk added a comment - 07/Jun/07 03:26 PM - edited
I think, I'm encountering a similar problem with Xerces 2.7.0. Xerces crashes when maxOccurs is too large.

Xerces always allocates memory for elements according to the number specified in maxOccurs, regardless of the real number of elements in xml instance. If the real numer of elements is small, most allocated elements are not needed.

For example:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="computer">
    <xs:complexType>
        <xs:sequence>
           <xs:element name="serialNumber" maxOccurs="10000"/>
        </xs:sequence>
    </xs:complexType>
    </xs:element>
</xs:schema>

<computer>
    <serialNumber>1</serialNumber>
</computer>

In this case 10000 elements are created, but only 1 is really needed. I found the following code fragment in ComplexTypeInfo::expandContentModel:

for (int j=1; j < maxOccurs-minOccurs; j++) {
    retNode = new (fMemoryManager) ContentSpecNode
    (
        ContentSpecNode::Sequence
        , retNode
        , optional
        , true
        , false
        , fMemoryManager
    );
}

In my case I have a xml schema which uses maxOccurs="9999999". So each time this element is found in a xml instance 9999999 elements are created although normally just a few are really needed. So I'd also prefer a counting version which creates only the really needed elements.


Boris Kolpackov added a comment - 13/May/08 02:46 PM
Tentatively changing the "fix for" version to 3.0.0. Alberto has a solution which for now does not work for all cases. He will try to get it into shape and I will try to help. If this fails then we will probably need to implement the "treat large maxOccurs as unbounded" hack for the time being.

Boris Kolpackov made changes - 13/May/08 02:46 PM
Fix Version/s 3.0.0 [ 12312833 ]
Michael Glavassevich added a comment - 13/May/08 05:28 PM
You may want to take a look at what Peter McCracken and I did for Xerces-J. See XERCESJ-773 and XERCESJ-1267.

Repository Revision Date User Message
ASF #676426 Sun Jul 13 20:01:22 UTC 2008 amassari Backported fix for XERCESJ-773: build a representation of large minOccurs/maxOccurs in constant time and memory (which uses a counter during validation) for element and wildcard particles when each model group particle in the content model:

* has minOccurs/maxOccurs == 1; or
* contains only one element/wildcard particle with minOccurs/maxOccurs == 1

(XERCESC-1051)
Files Changed
MODIFY /xerces/c/trunk/src/xercesc/validators/common/DFAContentModel.cpp
ADD /xerces/c/trunk/src/xercesc/validators/common/CMRepeatingLeaf.hpp
MODIFY /xerces/c/trunk/src/xercesc/validators/common/AllContentModel.cpp
MODIFY /xerces/c/trunk/projects/Win32/VC8/xerces-all/XercesLib/XercesLib.vcproj
MODIFY /xerces/c/trunk/projects/Win32/VC9/xerces-all/XercesLib/XercesLib.vcproj
MODIFY /xerces/c/trunk/src/xercesc/validators/schema/ComplexTypeInfo.hpp
MODIFY /xerces/c/trunk/projects/Win32/VC7.1/xerces-all/XercesLib/XercesLib.vcproj
MODIFY /xerces/c/trunk/src/xercesc/validators/common/DFAContentModel.hpp
MODIFY /xerces/c/trunk/src/xercesc/validators/common/CMLeaf.hpp
MODIFY /xerces/c/trunk/tests/src/XSTSHarness/XSTSHarnessHandlers.cpp
MODIFY /xerces/c/trunk/src/Makefile.am
MODIFY /xerces/c/trunk/projects/Win32/VC6/xerces-all/XercesLib/XercesLib.dsp
MODIFY /xerces/c/trunk/src/xercesc/validators/common/ContentSpecNode.hpp
MODIFY /xerces/c/trunk/src/xercesc/validators/schema/ComplexTypeInfo.cpp

Alberto Massari added a comment - 13/Jul/08 08:02 PM
The fix for Xerces-J has been backported to Xerces-C

Alberto Massari made changes - 13/Jul/08 08:02 PM
Resolution Fixed [ 1 ]
Status Open [ 1 ] Resolved [ 5 ]
Repository Revision Date User Message
ASF #677396 Wed Jul 16 19:36:20 UTC 2008 amassari - Build a compact tree also for xsd:any nodes (XERCESC-1051)
- Reduce the memory required to build a DFA by creating a shallow CMNode hierarchy
- Avoid stack overflow when analyzing non-compact model trees
- Optimized the building of the DFA state table by avoiding linear searches
Files Changed
MODIFY /xerces/c/trunk/src/xercesc/validators/common/CMRepeatingLeaf.hpp
MODIFY /xerces/c/trunk/src/xercesc/validators/common/CMAny.cpp
MODIFY /xerces/c/trunk/tests/src/XSTSHarness/XSTSHarnessHandlers.cpp
MODIFY /xerces/c/trunk/src/xercesc/validators/common/CMUnaryOp.cpp
MODIFY /xerces/c/trunk/src/xercesc/validators/common/CMAny.hpp
MODIFY /xerces/c/trunk/src/xercesc/validators/common/CMStateSet.hpp
MODIFY /xerces/c/trunk/src/xercesc/validators/common/CMUnaryOp.hpp
MODIFY /xerces/c/trunk/src/xercesc/validators/schema/ComplexTypeInfo.cpp
MODIFY /xerces/c/trunk/tests/src/XSTSHarness/regression/Xerces.testSet
MODIFY /xerces/c/trunk/src/xercesc/validators/common/DFAContentModel.cpp
ADD /xerces/c/trunk/tests/src/XSTSHarness/regression/XERCESC-1051
MODIFY /xerces/c/trunk/src/xercesc/validators/common/CMBinaryOp.cpp
ADD /xerces/c/trunk/tests/src/XSTSHarness/regression/XERCESC-1051/test_valid.xml
MODIFY /xerces/c/trunk/src/xercesc/validators/common/DFAContentModel.hpp
MODIFY /xerces/c/trunk/src/xercesc/validators/common/CMLeaf.hpp
ADD /xerces/c/trunk/tests/src/XSTSHarness/regression/XERCESC-1051/test.xml
ADD /xerces/c/trunk/tests/src/XSTSHarness/regression/XERCESC-1051/schema.xsd
MODIFY /xerces/c/trunk/src/xercesc/validators/common/CMBinaryOp.hpp
MODIFY /xerces/c/trunk/src/xercesc/validators/common/CMNode.hpp

Frank Rast added a comment - 10/Jul/09 01:13 PM
Now it does not crash but the performance could be better. Large maxOccurs stil slow down validation. Is there a way to get better performance in this case?

Repository Revision Date User Message
ASF #807213 Mon Aug 24 13:47:35 UTC 2009 amassari Improved performance and reduced memory footprint of schema validation involving large maxOccurs:
- the CMStateSet uses a sparsely allocated matrix to store the bits, resulting in less memory usage and faster bitwise operations (when analyzing an unallocated chunk, no operations are done); also, having moved the dynamic buffer data members into a separate structure, the space used by two pointers has been added to the cached bit fields, that is now 128 bits
- the DFA builder chooses the faster algorithm depending on the data being analyzed.
The regression test for XERCESC-1051 now completes in 30 seconds instead of 80
Files Changed
MODIFY /xerces/c/trunk/src/xercesc/validators/common/DFAContentModel.cpp
MODIFY /xerces/c/trunk/src/xercesc/util/XMLString.hpp
MODIFY /xerces/c/trunk/src/xercesc/validators/common/CMStateSet.hpp
MODIFY /xerces/c/trunk/tests/src/DOM/DOMTest/DTest.cpp