Tika
  1. Tika
  2. TIKA-578

XMLParser ContentHandler: multiple endDocument calls

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8
    • Fix Version/s: 0.9
    • Component/s: parser
    • Labels:
      None
    • Environment:

      N/A

      Description

      When supplying a ContentHandler to a XMLParser instance, the ContentHandler's .endDocument() method is called twice; once by the SAXParser created within XMLParser, once explicitly by XMLParser itself.

      Sample code:

      InputStream inputStream = ...
      XMLParser parser = new DcXMLParser();
      ParseContext context = new ParseContext();
      Metadata metadata = new Metadata();

      DOMResult result = new DOMResult();
      TransformerHandler transformerHandler = ((SAXTransformerFactory) SAXTransformerFactory.newInstance()).newTransformerHandler();
      transformerHandler.setResult(result);

      parser.parse(inputStream, transformerHandler, metadata, context);

      The following exception is produced:

      java.util.EmptyStackException
      at java.util.Stack.peek(Stack.java:85)
      at java.util.Stack.pop(Stack.java:67)
      at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.endDocument(SAX2DOM.java:143)
      at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.endDocument(ToXMLSAXHandler.java:181)
      at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:231)
      at org.apache.tika.sax.ContentHandlerDecorator.endDocument(ContentHandlerDecorator.java:115)
      at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:212)
      at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71)
      ...

      We have worked around the issue temporarily by passing in a ContentHandler that eats the first .endDocument() call, and allows the second to go through. However, we believe XMLParser should hide the extraneous .endDocument() call internally.

        Activity

          People

          • Assignee:
            Jukka Zitting
            Reporter:
            Scott Severtson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development