Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.2
-
None
Description
As per TIKA-431, the Content-Encoding field in response headers is used to specify the compression (gzip, deflate, etc) of the response data, not the charset (text encoding).
Currently Tika returns this from a parse request via Metadata.CONTENT_ENCODING, but that should be deprecated and eventually phased out, e.g. in version 2.0
Attachments
Issue Links
- relates to
-
TIKA-431 Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly.
- Resolved