Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-578

XMLParser ContentHandler: multiple endDocument calls

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8
    • Fix Version/s: 0.9
    • Component/s: parser
    • Labels:
      None
    • Environment:

      N/A

      Description

      When supplying a ContentHandler to a XMLParser instance, the ContentHandler's .endDocument() method is called twice; once by the SAXParser created within XMLParser, once explicitly by XMLParser itself.

      Sample code:

      InputStream inputStream = ...
      XMLParser parser = new DcXMLParser();
      ParseContext context = new ParseContext();
      Metadata metadata = new Metadata();

      DOMResult result = new DOMResult();
      TransformerHandler transformerHandler = ((SAXTransformerFactory) SAXTransformerFactory.newInstance()).newTransformerHandler();
      transformerHandler.setResult(result);

      parser.parse(inputStream, transformerHandler, metadata, context);

      The following exception is produced:

      java.util.EmptyStackException
      at java.util.Stack.peek(Stack.java:85)
      at java.util.Stack.pop(Stack.java:67)
      at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.endDocument(SAX2DOM.java:143)
      at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.endDocument(ToXMLSAXHandler.java:181)
      at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:231)
      at org.apache.tika.sax.ContentHandlerDecorator.endDocument(ContentHandlerDecorator.java:115)
      at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:212)
      at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71)
      ...

      We have worked around the issue temporarily by passing in a ContentHandler that eats the first .endDocument() call, and allows the second to go through. However, we believe XMLParser should hide the extraneous .endDocument() call internally.

        Activity

        Hide
        jukkaz Jukka Zitting added a comment -

        Good point, thanks! Fixed in revision 1060818.

        Show
        jukkaz Jukka Zitting added a comment - Good point, thanks! Fixed in revision 1060818.

          People

          • Assignee:
            jukkaz Jukka Zitting
            Reporter:
            scottsevertson Scott Severtson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development