Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.0, 3.0.0 PDFBox
-
None
-
OS X 10.11.4, Java 1.8.0_73-b02
Description
When running the command line ExtractText with the -html flag, the output file always has the following meta tag specifying UTF-16 regardless of the actual output encoding:
<meta http-equiv="Content-Type" content="text/html; charset="UTF-16">
This causes editors that respect the meta tag (emacs, etc.) to garble the file content.
Attachments
Attachments
Issue Links
- relates to
-
PDFBOX-2384 ExtractText should default to UTF-8
- Closed