[SOLR-2550] Apache Solr needs an updated TIKA version in its extraction libraries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.4.1
Fix Version/s: 1.4.2, 3.1
Component/s: contrib - Solr Cell (Tika extraction)
Labels:
- extraction
- indexing
- pdf
- secure

Description

There are issues with some PDF documents when it gets indexed (extracted?). There is an issue being fixed by PDFBOX in the version PDFBox 1.1.0. But Apache solr 1.4.1 doesn't have the latest version of these jars which is causing these failures. We have tika-pareser0.4 in this solr 1.4.1 distribution which has to be updated to 0.9 version.

Reference for the issue and the solution : https://issues.apache.org/jira/browse/PDFBOX-617

Attachments

Activity

People

Assignee:: Steven Rowe

Reporter:: Surendranadh Puranam

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 27/May/11 12:21

Updated:: 27/Nov/11 12:39

Resolved:: 26/Oct/11 00:44