Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Hi there
We have decided to remove support for some formats when using Tika to extract text and metadata.
We defined our list of Parsers:
private static final Parser PARSERS[] = new Parser[] { // documents new org.apache.tika.parser.html.HtmlParser(), new org.apache.tika.parser.rtf.RTFParser(), new org.apache.tika.parser.pdf.PDFParser(), new org.apache.tika.parser.txt.TXTParser(), new org.apache.tika.parser.microsoft.OfficeParser(), new org.apache.tika.parser.microsoft.OldExcelParser(), new org.apache.tika.parser.microsoft.ooxml.OOXMLParser(), new org.apache.tika.parser.odf.OpenDocumentParser(), new org.apache.tika.parser.iwork.IWorkPackageParser(), new org.apache.tika.parser.xml.DcXMLParser(), new org.apache.tika.parser.epub.EpubParser(), }; private static final AutoDetectParser PARSER_INSTANCE = new AutoDetectParser(PARSERS); private static final Tika TIKA_INSTANCE = new Tika(PARSER_INSTANCE.getDetector(), PARSER_INSTANCE);
But when a MS Office Word document embeds another non supported document (Like a Visio Schema) an NoClassDefFoundError is raised.
Would it be possible to catch such a case and throw in that case a TikaException so it behaves as an Exception and not as a Throwable?
Attachments
Issue Links
- is related to
-
TIKA-2212 Update mimes for OOXMLParser
- Resolved