Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.10
-
None
-
None
Description
We have a lot of Outlook msg files that have RTF body content. Tika is not finding any text within these messages. It appears to be a mixture of RTF and HTML.
I've extracted an example RTF body (see attachment) for use with the following test case:
ByteArrayOutputStream bytes = new ByteArrayOutputStream() rtfParser.parse( this.class.getResourceAsStream("/problems/no-text.rtf"), new EmbeddedContentHandler(new BodyContentHandler(bytes)), new Metadata(), new ParseContext() ); assertTrue("Document is missing required text", bytes.toByteArray().length > 0)
Attachments
Attachments
Issue Links
- is related to
-
TIKA-2883 Text not extracted from RTF files
- Resolved