Description
When supplying a ContentHandler to a XMLParser instance, the ContentHandler's .endDocument() method is called twice; once by the SAXParser created within XMLParser, once explicitly by XMLParser itself.
Sample code:
—
InputStream inputStream = ...
XMLParser parser = new DcXMLParser();
ParseContext context = new ParseContext();
Metadata metadata = new Metadata();
DOMResult result = new DOMResult();
TransformerHandler transformerHandler = ((SAXTransformerFactory) SAXTransformerFactory.newInstance()).newTransformerHandler();
transformerHandler.setResult(result);
parser.parse(inputStream, transformerHandler, metadata, context);
—
The following exception is produced:
—
java.util.EmptyStackException
at java.util.Stack.peek(Stack.java:85)
at java.util.Stack.pop(Stack.java:67)
at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.endDocument(SAX2DOM.java:143)
at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.endDocument(ToXMLSAXHandler.java:181)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:231)
at org.apache.tika.sax.ContentHandlerDecorator.endDocument(ContentHandlerDecorator.java:115)
at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:212)
at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71)
...
—
We have worked around the issue temporarily by passing in a ContentHandler that eats the first .endDocument() call, and allows the second to go through. However, we believe XMLParser should hide the extraneous .endDocument() call internally.