Description
Related to SOLR-4809: Solr receives multiple startDocument events when parsing OpenDocumentFiles.
The parser already prevents multiple endDocuments, but not multiple startDocuments.
The bug was introduced when we added parsing content.xml and meta.xml (TIKA-736, but both feed elements to the XHTML output, so we get multiple start/endDocuments).
Attachments
Attachments
Issue Links
- blocks
-
SOLR-4809 OpenOffice document body is not indexed by SolrCell
- Closed
- breaks
-
SOLR-4809 OpenOffice document body is not indexed by SolrCell
- Closed
- is broken by
-
TIKA-736 OpenOffice parser: master footer text isn't extracted
- Resolved
- relates to
-
TIKA-646 tika command line can't extract metadata for OOXML files
- Closed