[TIKA-1211] OpenDocument (ODF) parser produces multiple startDocument() events - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4
Fix Version/s: 1.5
Component/s: parser
Labels:
None

Description

Related to ~~SOLR-4809~~: Solr receives multiple startDocument events when parsing OpenDocumentFiles.

The parser already prevents multiple endDocuments, but not multiple startDocuments.

The bug was introduced when we added parsing content.xml and meta.xml (~~TIKA-736~~, but both feed elements to the XHTML output, so we get multiple start/endDocuments).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TIKA-1211.patch
22/Dec/13 20:55
1 kB
Vadim Roizman

Issue Links

blocks

SOLR-4809 OpenOffice document body is not indexed by SolrCell

Closed

breaks

SOLR-4809 OpenOffice document body is not indexed by SolrCell

Closed

is broken by

TIKA-736 OpenOffice parser: master footer text isn't extracted

Resolved

relates to

TIKA-646 tika command line can't extract metadata for OOXML files

Closed

Activity

People

Assignee:: Kenneth William Krugler

Reporter:: Uwe Schindler

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Dec/13 12:49

Updated:: 25/Mar/14 16:21

Resolved:: 24/Dec/13 15:52