Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-578

XMLParser ContentHandler: multiple endDocument calls

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8
    • Fix Version/s: 0.9
    • Component/s: parser
    • Labels:
      None
    • Environment:

      N/A

      Description

      When supplying a ContentHandler to a XMLParser instance, the ContentHandler's .endDocument() method is called twice; once by the SAXParser created within XMLParser, once explicitly by XMLParser itself.

      Sample code:

      InputStream inputStream = ...
      XMLParser parser = new DcXMLParser();
      ParseContext context = new ParseContext();
      Metadata metadata = new Metadata();

      DOMResult result = new DOMResult();
      TransformerHandler transformerHandler = ((SAXTransformerFactory) SAXTransformerFactory.newInstance()).newTransformerHandler();
      transformerHandler.setResult(result);

      parser.parse(inputStream, transformerHandler, metadata, context);

      The following exception is produced:

      java.util.EmptyStackException
      at java.util.Stack.peek(Stack.java:85)
      at java.util.Stack.pop(Stack.java:67)
      at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.endDocument(SAX2DOM.java:143)
      at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.endDocument(ToXMLSAXHandler.java:181)
      at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:231)
      at org.apache.tika.sax.ContentHandlerDecorator.endDocument(ContentHandlerDecorator.java:115)
      at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:212)
      at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71)
      ...

      We have worked around the issue temporarily by passing in a ContentHandler that eats the first .endDocument() call, and allows the second to go through. However, we believe XMLParser should hide the extraneous .endDocument() call internally.

        Attachments

          Activity

            People

            • Assignee:
              jukkaz Jukka Zitting
              Reporter:
              scottsevertson Scott Severtson
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: