Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2550

Apache Solr needs an updated TIKA version in its extraction libraries

    Details

      Description

      There are issues with some PDF documents when it gets indexed (extracted?). There is an issue being fixed by PDFBOX in the version PDFBox 1.1.0. But Apache solr 1.4.1 doesn't have the latest version of these jars which is causing these failures. We have tika-pareser0.4 in this solr 1.4.1 distribution which has to be updated to 0.9 version.

      Reference for the issue and the solution : https://issues.apache.org/jira/browse/PDFBOX-617

        Attachments

          Activity

            People

            • Assignee:
              steve_rowe Steve Rowe
              Reporter:
              spuanam Surendranadh Puranam
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: