Affects Version/s: 1.8
Fix Version/s: None
The PDF document at http://www.grdc.com.au/uploads/documents/Rust%20Biosecurity%20Brochure.pdf, when converted with tika v1.1 using
Produces substantially worse output than xpdf's pdftotext program.
Specifically, we see...
Some 'spaces' replaced with question marks
and some odd case conversions
(The original document seems to contain "SOURCE: BRAD COLLIS" all in upper case.
To compare that with pdftotext
This does not output the question marks, and produces "Source: BRAD COLLIS" at the end there, both of which seem to be improvements. Note that it does, however, produce a number of ^G characters which are not desireable.