Solr
  1. Solr
  2. SOLR-2550

Apache Solr needs an updated TIKA version in its extraction libraries

    Details

      Description

      There are issues with some PDF documents when it gets indexed (extracted?). There is an issue being fixed by PDFBOX in the version PDFBox 1.1.0. But Apache solr 1.4.1 doesn't have the latest version of these jars which is causing these failures. We have tika-pareser0.4 in this solr 1.4.1 distribution which has to be updated to 0.9 version.

      Reference for the issue and the solution : https://issues.apache.org/jira/browse/PDFBOX-617

        Activity

        Hide
        Jan Høydahl added a comment -

        There will probably be no 1.4.2 release. Recommend to vote for SOLR-2372 to get TIKA0.9 into Solr 3.3, and then upgrade to 3.3 (which is trivial).

        Show
        Jan Høydahl added a comment - There will probably be no 1.4.2 release. Recommend to vote for SOLR-2372 to get TIKA0.9 into Solr 3.3, and then upgrade to 3.3 (which is trivial).
        Hide
        Steve Rowe added a comment -

        Solr Cell upgraded to Tika 0.8, which included PDFbox 1.1.0, in the Solr 3.1 release.

        Show
        Steve Rowe added a comment - Solr Cell upgraded to Tika 0.8, which included PDFbox 1.1.0, in the Solr 3.1 release.
        Hide
        Uwe Schindler added a comment -

        Bulk close after release of 3.1

        Show
        Uwe Schindler added a comment - Bulk close after release of 3.1

          People

          • Assignee:
            Steve Rowe
            Reporter:
            Surendranadh Puranam
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development