Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2460

Some European characters cannot be parsed correctly for some PDFs

    XMLWordPrintableJSON

    Details

      Description

      The Norwegian characters (æ, ø and å) in the following PDF document will not display correctly after Solr has indexed it, using Solr Cell:
      http://ridder.uio.no/dokument.pdf

      If I manually change the version of PDFBox (one of Tika's dependencies) to 1.4.0, the document will parse correctly.

      I suggest that the next release of Solr ships with version 0.9 of Tika which also has updated its PDFBox dependencies to 1.4.0

        Attachments

          Activity

            People

            • Assignee:
              sarowe Steven Rowe
              Reporter:
              erlendfg Erlend Garåsen
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: