Log4cxx
  1. Log4cxx
  2. LOGCXX-167

system locale charmap is not determined properly on Fedora Core 6

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.7
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None
    • Environment:
      Fedora Core 6

      Description

      APRCharsetDecoder (in charsetdecoder.cpp) is used to "decode" the input characters based on the "input" charset. The input charset is determined based on the system locale; eventually, this is done by the call to nl_langinfo(CODESET) in apr_os_locale_encoding (in charset.c). Unfortunately, the behavior of nl_langinfo is somewhat inconsistent across various Linux flavors. According to the man page, it's supposed to return the same info as "locale charmap" command. However, this is not the case on Fedora Core 6:

      locale charmap
      UTF-8

      but nl_langinfo(CODESET) returns ANSI_X3.4-1968. Naturally, this results in non-ASCII characters being completely misinterpreted and replaced by "?" by APRCharsetDecoder::decode().

      The fix is actually very simple - we just need to call setlocale(LC_ALL, "") before the call to nl_langinfo. I've added it as the first line of APRCharsetDecoder constructor, which fixed the issue for me.

      As a side note, it'd be nice to be able to set the "input" locale through configuration similar to the way it can be done for individual Appenders.

      1. setlocale.diff
        1.0 kB
        Igor Slepchin

        Activity

        Hide
        Curt Arnold added a comment -

        This would seem to be an APR bug that should be fixed at that level. Calling setlocale(LC_ALL, "") would be specific to Unix type systems and we try to keep all platform dependence in the APR level if at all possible. At least the call to setlocale() would need to be conditionalized to be avoided an platforms where it was not supported. If you do raise this on apr-dev or log an APR bug, please provide add a comment to this bug with the APR bug or mail archive URL.

        log4j doesn't have the concept of an log API character encoding as all strings are inherently UTF-16. Configuration files have encoding conventions that are independent of the locale settings (property files are always UTF-8, XML files have their own means of determining encoding). So there isn't a log4j pattern to go by. If you have suggestions or want to discuss it, raise it on log4cxx-dev for preliminary discussion and/or log another bug for the enhancement request.

        Show
        Curt Arnold added a comment - This would seem to be an APR bug that should be fixed at that level. Calling setlocale(LC_ALL, "") would be specific to Unix type systems and we try to keep all platform dependence in the APR level if at all possible. At least the call to setlocale() would need to be conditionalized to be avoided an platforms where it was not supported. If you do raise this on apr-dev or log an APR bug, please provide add a comment to this bug with the APR bug or mail archive URL. log4j doesn't have the concept of an log API character encoding as all strings are inherently UTF-16. Configuration files have encoding conventions that are independent of the locale settings (property files are always UTF-8, XML files have their own means of determining encoding). So there isn't a log4j pattern to go by. If you have suggestions or want to discuss it, raise it on log4cxx-dev for preliminary discussion and/or log another bug for the enhancement request.
        Hide
        Curt Arnold added a comment -

        Looked at this a bit more. It seems fairly well defined that you should call setlocale(LC_CTYPE or LC_ALL, "") if you expect to respect the locale specified by the environment variables. I'm guessing without that call, it is defaulting to the "C" locale which resulted in the assumption that you wanted US-ASCII as your charset.

        setlocale is not called by any function in APR, so there is not an APR way to force initialization of the locale. You just have to call the POSIX setlocale() method at the appropriate time. If your app is anticipating non-US-ASCII characters, it should be calling setlocale() or any C RTL call that depends on the charset would not behave as expected. It should also affect the MbstowcsCharsetDecoder in a similar manner.

        Changing the locale of the app seems like a too big of a significant side-effect for using log4cxx. So I don't see that it could be added to log4cxx or APR, just that you need to do it at the right place in your app which might be complicated if you are doing any logging before entering main or the like. I considered just using the UTF8CharsetDecoder or ISO8859CharsetDecoder if apr_os_locale_charset returned US-ASCII, but things would not be pretty if it guessed wrong.

        I'll close the bug report if there is no further discussion.

        Show
        Curt Arnold added a comment - Looked at this a bit more. It seems fairly well defined that you should call setlocale(LC_CTYPE or LC_ALL, "") if you expect to respect the locale specified by the environment variables. I'm guessing without that call, it is defaulting to the "C" locale which resulted in the assumption that you wanted US-ASCII as your charset. setlocale is not called by any function in APR, so there is not an APR way to force initialization of the locale. You just have to call the POSIX setlocale() method at the appropriate time. If your app is anticipating non-US-ASCII characters, it should be calling setlocale() or any C RTL call that depends on the charset would not behave as expected. It should also affect the MbstowcsCharsetDecoder in a similar manner. Changing the locale of the app seems like a too big of a significant side-effect for using log4cxx. So I don't see that it could be added to log4cxx or APR, just that you need to do it at the right place in your app which might be complicated if you are doing any logging before entering main or the like. I considered just using the UTF8CharsetDecoder or ISO8859CharsetDecoder if apr_os_locale_charset returned US-ASCII, but things would not be pretty if it guessed wrong. I'll close the bug report if there is no further discussion.
        Hide
        Curt Arnold added a comment -

        Revision 525393 added calls to setlocale to the tests and example programs.

        log4cxx can't set the locale, but it does need to respond to locale changes since the default decoder will likely be established on a static getLogger call before the app calls setlocale. I'm expecting to remove the APR_LOCALE_CHARSET special case from APRCharset*coder classes and add an APRLocaleCharsetEncoder and Decoder classes that check if the return value or apr_os_locale_encoding has changed.

        Show
        Curt Arnold added a comment - Revision 525393 added calls to setlocale to the tests and example programs. log4cxx can't set the locale, but it does need to respond to locale changes since the default decoder will likely be established on a static getLogger call before the app calls setlocale. I'm expecting to remove the APR_LOCALE_CHARSET special case from APRCharset*coder classes and add an APRLocaleCharsetEncoder and Decoder classes that check if the return value or apr_os_locale_encoding has changed.
        Hide
        Curt Arnold added a comment -

        Committed fixes in rev 594658. APRLocaleCharsetEncoder and Decoder scan the input source to detect if there is any non-ASCII characters. If not, just passes them through. If there are, it gets the current encoding, checks it against the previously determined encoding and updates an encapsulated APRCharsetEncoder or Decoder before delegating to it.

        Show
        Curt Arnold added a comment - Committed fixes in rev 594658. APRLocaleCharsetEncoder and Decoder scan the input source to detect if there is any non-ASCII characters. If not, just passes them through. If there are, it gets the current encoding, checks it against the previously determined encoding and updates an encapsulated APRCharsetEncoder or Decoder before delegating to it.

          People

          • Assignee:
            Curt Arnold
            Reporter:
            Igor Slepchin
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development