Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-578

XMLParser ContentHandler: multiple endDocument calls

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.8
    • 0.9
    • parser
    • None
    • N/A

    Description

      When supplying a ContentHandler to a XMLParser instance, the ContentHandler's .endDocument() method is called twice; once by the SAXParser created within XMLParser, once explicitly by XMLParser itself.

      Sample code:

      InputStream inputStream = ...
      XMLParser parser = new DcXMLParser();
      ParseContext context = new ParseContext();
      Metadata metadata = new Metadata();

      DOMResult result = new DOMResult();
      TransformerHandler transformerHandler = ((SAXTransformerFactory) SAXTransformerFactory.newInstance()).newTransformerHandler();
      transformerHandler.setResult(result);

      parser.parse(inputStream, transformerHandler, metadata, context);

      The following exception is produced:

      java.util.EmptyStackException
      at java.util.Stack.peek(Stack.java:85)
      at java.util.Stack.pop(Stack.java:67)
      at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.endDocument(SAX2DOM.java:143)
      at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.endDocument(ToXMLSAXHandler.java:181)
      at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:231)
      at org.apache.tika.sax.ContentHandlerDecorator.endDocument(ContentHandlerDecorator.java:115)
      at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:212)
      at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71)
      ...

      We have worked around the issue temporarily by passing in a ContentHandler that eats the first .endDocument() call, and allows the second to go through. However, we believe XMLParser should hide the extraneous .endDocument() call internally.

      Attachments

        Activity

          People

            jukkaz Jukka Zitting
            scottsevertson Scott Severtson
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: