Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.1.0
-
None
-
None
-
Windows 10 pro version 10.0.19043 Build 19043
Java:
openjdk version "1.8.0-262"
OpenJDK Runtime Environment (build 1.8.0-262-b10)
OpenJDK 64-Bit Server VM (build 25.71-b10, mixed mode)OCR:
Tesseract 5
Description
Tika cannot extract the text in the attached .eml file. Instead, it returns what I think is the encoding of the attachments.
This does not happen in all .eml files but we have not been able to identify the cause of this behavior. The same file saved in .msg format is extracted correctly.
The extracted .txt file has the same size as the original .eml file.
I will attach the .eml file and the output provided by tika.
The command used is
java -jar tika-app-2.1.0.jar path\to\eml_test.eml > output.txt