The JSR 173 (StAX) Specification did not do an adequate job defining the semantics for processing DTD DOCTYPE constructs.
The reference implementation's getValue() returns the entire subset of the DOCTYPE instead of returning the instance (docinfo) information.
This is a known issue and has been discussed on the forum.
The problem is worse if the DOCTYPE references as external location. To get the subset, the parser implementation must do a network call.
This is (a) ill-performant and (b) requires the application to be attached to a network.
In addition, the various parser implementations have different mechanisms for getting the DOCTYPE subset. Some implementations apparently defer
the processing until the getText() call...while other implementations load the subset when the tag is processed.
Configuration and deployment files (i.e. web.xml) often contain DOCTYPE constructs. In many situations, the deployer may not be connected to the
network when processing the file. In such a scenario, the deployer needs a mechanism to process the file without being hindered by the DOCTYPE
The proposed solution is to add new methods to StAXUtils:
A caller (i.e. a deployer application) can use the new methods to safely obtain an XMLStreamReader that is configured for a network detached environment.
As StAX changes, we can update the implementation of the methods.
I am working on the proposed solution and tests.