Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.10
-
None
-
None
Description
[ruby & tika-server-1.10] The PDF file: http://ratsinfo.dresden.de/getfile.php?id=52546&type=do will be converted to a xml-file, that contains the full converted text + meta and XML structure two times. Thats out of the XML spec an my following xml parser crash.
I tried also givemetext.okfnlabs.org, which uses Tika-server + OCR, with this file and this prints nothing out of the file.
Thousands of other files are correct converted, but not this one.