Tika
  1. Tika
  2. TIKA-578

XMLParser ContentHandler: multiple endDocument calls

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8
    • Fix Version/s: 0.9
    • Component/s: parser
    • Labels:
      None
    • Environment:

      N/A

      Description

      When supplying a ContentHandler to a XMLParser instance, the ContentHandler's .endDocument() method is called twice; once by the SAXParser created within XMLParser, once explicitly by XMLParser itself.

      Sample code:

      InputStream inputStream = ...
      XMLParser parser = new DcXMLParser();
      ParseContext context = new ParseContext();
      Metadata metadata = new Metadata();

      DOMResult result = new DOMResult();
      TransformerHandler transformerHandler = ((SAXTransformerFactory) SAXTransformerFactory.newInstance()).newTransformerHandler();
      transformerHandler.setResult(result);

      parser.parse(inputStream, transformerHandler, metadata, context);

      The following exception is produced:

      java.util.EmptyStackException
      at java.util.Stack.peek(Stack.java:85)
      at java.util.Stack.pop(Stack.java:67)
      at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.endDocument(SAX2DOM.java:143)
      at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.endDocument(ToXMLSAXHandler.java:181)
      at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:231)
      at org.apache.tika.sax.ContentHandlerDecorator.endDocument(ContentHandlerDecorator.java:115)
      at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:212)
      at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71)
      ...

      We have worked around the issue temporarily by passing in a ContentHandler that eats the first .endDocument() call, and allows the second to go through. However, we believe XMLParser should hide the extraneous .endDocument() call internally.

        Activity

        Scott Severtson created issue -
        Hide
        Jukka Zitting added a comment -

        Good point, thanks! Fixed in revision 1060818.

        Show
        Jukka Zitting added a comment - Good point, thanks! Fixed in revision 1060818.
        Jukka Zitting made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Jukka Zitting [ jukkaz ]
        Fix Version/s 0.9 [ 12315488 ]
        Resolution Fixed [ 1 ]
        Jukka Zitting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Jukka Zitting
            Reporter:
            Scott Severtson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development