Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.4.1, 3.1
-
None
-
Tika, PDFBox
Description
The Norwegian characters (æ, ø and å) in the following PDF document will not display correctly after Solr has indexed it, using Solr Cell:
http://ridder.uio.no/dokument.pdf
If I manually change the version of PDFBox (one of Tika's dependencies) to 1.4.0, the document will parse correctly.
I suggest that the next release of Solr ships with version 0.9 of Tika which also has updated its PDFBox dependencies to 1.4.0