Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.18
-
None
-
None
-
Windows 10
Description
I have a small text file in two versions:
- a dos version of the file
- a unix version of the file
Both contain the same text below:
La politique macroéconomique cesse officiellement d’être
l’alpha et l’oméga de la lutte contre le chômage.
When I parse them using the tika-app.jar, the text is correctly "extracted" from the DOS version of the file. For the UNIX version of the file the apostrophes are falsely rendered as question marks.