The explanation for this issue is as follows:
- OMElementImpl#internalSerialize uses an OMChildrenIterator to iterator over its own children.
- OMChildrenIterator actually swallows any exception thrown by the StAXOMBuilder, and therefore by the underlying StAX parser. In this case it is the exception indicating the premature end of stream that is swallowed.
- Since OMElementImpl#internalSerialize doesn't exit at this point, it will end up calling StAXOMBuilder again, which will request new events from the StAX parser.
- One would expect that a call to next() on an XMLStreamReader that has already thrown an exception before would fail again (especially when the underlying input stream is closed). However when Woodstox is used, this is not the case, at least not in the example shown here.
There are therefore three issues that contribute to the problem described here:
1. OMChildrenIterator swallows exceptions.
2. Axiom may call next() on an XMLStreamReader that has already thrown an exception before.
3. After throwing an exception, the Woodstox parser is left in an inconsistent state (at least for the end of stream case).
These three factors can lead to an infinite recursion which eventually causes an OutOfMemoryError. When I opened this issue, I considered it more as a problem in error reporting in Axiom. However, it is more serious than I initially thought, because the same situation could occur in any piece of code using OMChildrenIterator, if the XML being parsed is invalid or when an I/O error occurs. This could either be unintentional (e.g. network error) or intentional (i.e. this flaw could potentially be used to carry out a denial of service attack against software built using Axiom).
To solve this issue I would suggest the following changes:
1. Let OMChildrenIterator#next() throw OMExceptions (again).
2. Add a flag to StAXOMBuilder that is set when an exception is thrown by the StAX parser and prevent it from calling the parser again once this flag is set.
Note that the current behavior of OMChildrenIterator#next() was implemented as part of
WSCOMMONS-91 and AXIS2-919. The rationale was that
[quote] According to the Sun documentation for the java.util.Iterator interface, the following loop:
should not throw any exceptions at all, ever, unless there is a bug in the program. [/quote]
and that therefore the previous implementation (which threw OMExceptions) would violate the java.util.Iterator contract. However I was unable to find any documentation or specification that explicitly forbids implementations of Iterator#next() to throw unchecked exceptions (other than NoSuchElementException and ConcurrentModificationException). I believe that the above statement is primarily based on a very orthodox interpretation of the Java API, namely that unchecked exceptions should only be used to indicate programming bugs. It should be noted that Axiom itself doesn't follow that paradigm, OMException being a runtime exception.
Also I believe that the risk of breaking existing code by changing the behavior of OMChildrenIterator#next() is negligibly small. The reason is that if the code uses OMChildrenIterator, then it will almost certainly use other methods of the Axiom API which can throw OMExceptions, and the code must therefore be prepared for this type of exceptions anyway. One could argue (and that was one of the arguments in
AXIS2-919) that the OMChildrenIterator could also be used by code that "may not be aware that the Iterator that they have been handed is actually an Iterator from OMElement". However, whatever such code may do with an OMException thrown by next() could not be worse than the current behavior of the next() method, i.e. to discard the exception completely.
Therefore I currently don't see any valid reason not to change the behavior of OMChildrenIterator#next().