Actually this isn't a problem of a parser outputting metadata after
The problem, for both of the test docs, is that the Outlook message
has a chunk of RTF text and so OutlookExtractor recurses into the
RTFParser then calls start/endDocument itself.
I can fix this by having RTFParser expose a separate parse method,
with control over whether or not it should call start/endDocument
itself; that seems to fix these two test docs.
However, if the Outlook message has an HTML chunk, it's also broken:
try running TikaGUI on
(that's an HTML Outlook message).
How can/should we fix that one? It's tagsoup that's calling