Description
This bug is linked to TIKA-2671, but does not concern metadata, but rather the bytes-based detection itself.
While reading the specification, I collected a list of sample cases where HtmlEncodingDetector differs from the specification, and thus fails at detecting the right charset.
I am attaching the test cases to this issue:
Attachments
Attachments
Issue Links
- relates to
-
TIKA-2933 Revisit "replacement" encoding mappings in StandardHtmlEncodingDetector.
- Open