Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.9.7
-
None
-
None
-
Fedora Core 6
Description
APRCharsetDecoder (in charsetdecoder.cpp) is used to "decode" the input characters based on the "input" charset. The input charset is determined based on the system locale; eventually, this is done by the call to nl_langinfo(CODESET) in apr_os_locale_encoding (in charset.c). Unfortunately, the behavior of nl_langinfo is somewhat inconsistent across various Linux flavors. According to the man page, it's supposed to return the same info as "locale charmap" command. However, this is not the case on Fedora Core 6:
locale charmap
UTF-8
but nl_langinfo(CODESET) returns ANSI_X3.4-1968. Naturally, this results in non-ASCII characters being completely misinterpreted and replaced by "?" by APRCharsetDecoder::decode().
The fix is actually very simple - we just need to call setlocale(LC_ALL, "") before the call to nl_langinfo. I've added it as the first line of APRCharsetDecoder constructor, which fixed the issue for me.
As a side note, it'd be nice to be able to set the "input" locale through configuration similar to the way it can be done for individual Appenders.