Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1847

Solrj doesn't know if PDF was actually parsed by Tika

    XMLWordPrintableJSON

    Details

      Description

      When posting pdf files using solrj the only response we get from Solr is only server response status, but never know whether
      pdf was actually parsed or not, checking the log I found that Tika wasn't able
      to succeed with some pdf files because of content nature (texts in images only) or are corrupted:

      25 mars 2010 14:54:07 org.apache.pdfbox.util.PDFStreamEngine processOperator
      INFO: unsupported/disabled operation: EI

      25 mars 2010 14:54:02 org.apache.pdfbox.filter.FlateFilter decode
      GRAVE: Stop reading corrupt stream

      The question is how can I catch these kinds of exceptions through Solrj ?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              elsadek elsadek
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: