Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.13
-
None
-
None
-
org.apache.tika.parser.pdf.PDFParser
Description
I get strange output when parsing this pdf:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.3977&rep=rep1&type=pdf
with PUT to /tika and headers: Accept:text/html
An extract of the output:
"<p>��������� ��
���������������������������� �!"���
</p>
<p>#$�% ���!"�%&'��+�,!-���
</p>
<p>
.�� ��/�� 10��������� �!"21� �434�%54!"�6�
</p>
<p>7�8:9�;�<>=@?�A�9�BDC
</p>
<p>E A FHG�9DI"JLK�M�NLOPJLB�N�J.Q�JLGR8:K-I"FSJLB�I
</p>
<p>E M T"U:V@TXW Y�U Z�NLI"A [RJLK
]U U:V</p>
<p/>"