[SOLR-1847] Solrj doesn't know if PDF was actually parsed by Tika - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: 1.5
Fix Version/s: None
Component/s: contrib - Solr Cell (Tika extraction)
Labels:
- Solr
- Solrj
- Tika
- Tomcat6
Environment:

TOMCAT 6.0.24, SOLR 1.5Dev, Solrj1.5Dev Tika

Description

When posting pdf files using solrj the only response we get from Solr is only server response status, but never know whether
pdf was actually parsed or not, checking the log I found that Tika wasn't able
to succeed with some pdf files because of content nature (texts in images only) or are corrupted:

25 mars 2010 14:54:07 org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EI

25 mars 2010 14:54:02 org.apache.pdfbox.filter.FlateFilter decode
GRAVE: Stop reading corrupt stream

The question is how can I catch these kinds of exceptions through Solrj ?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: elsadek

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 26/Mar/10 08:48

Updated:: 02/Oct/16 02:32

Resolved:: 02/Oct/16 02:32