Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.7
-
None
-
None
-
linux, oracle jdk 7u75
Description
On tika-1.8-rc1.
java -jar tika-app/target/tika-app-1.8.jar -x 2.rtf returns
<?xml version="1.0" encoding="UTF-8"?><div xmlns="http://www.w3.org/1999/xhtml">HOHcvanAHTI'Imoc v8 Hanemnan npfiBOBafi "DRAW </div> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- tail omitted -->
Removing image prevents such behavior (3.rtf doesn't contain embedded image).
Update: you should have tesseract installed to reproduce this issue.