Uploaded image for project: 'Log4cxx'
  1. Log4cxx
  2. LOGCXX-340

Transcoder::encodeCharsetName bungles encoding

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.11.0
    • Component/s: Appender
    • Labels:
      None

      Description

      On Aug 20, 2009, at 9:06 PM, shadow king wrote on log4cxx-user:

      HI,

      I am a chinese and I am using log4cxx as a logging facility in my project(the locale in my linux server has been set to "zh_CN.GBK").

      when I switch to the 0.10.0 release(I used version 0.97 beta before), I came cross a problem: all the chinese logging message produced by my program could not be displayed correctly.

      Therefore, I decided to examine the source, and i found something which I suspect was the cause of my problem, the suspected code is:

      std::string Transcoder::encodeCharsetName(const LogString& val) {
      char asciiTable[] = { ' ', '!', '"', '#', '$', '%', '&', '\'', '(', ')', '*', '+', ',', '-', '.', '/',
      '0', '1', '2', '3', '4', '5', '6' , '7', '8', '9', ':', ';', '<', '=', '>', '?',
      '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
      'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\\', ']', '^', '_',
      '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
      'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '

      {', '|', '}

      ', '~', ' ' };
      std::string out;
      for(LogString::const_iterator iter = val.begin();
      iter != val.end();
      iter++) {
      if (*iter >= 0x30 && *iter < 0x7F)

      { out.append(1, asciiTable[*iter - 0x30]); // this is the problematic line of code for me. }

      else

      { out.append(1, LOSSCHAR); }

      }
      printf(out.c_str());
      return out;
      }

      I replace the line "out.append(1, asciiTable[*iter - 0x30]);" to "out.append(1, *iter);", then my problem was solved.(The input arguement of this function is "GBK" in my system. Before I hacked the code, this function resturn "12;"; After the hacking, this function return "GBK" which is my desire result).

      I don't understand why we need to change the name of the charset name(for the fear of non-ascii charset names? even with that fear, I can't see the need of changing from "GBK" to "12;")

        Attachments

          Activity

            People

            • Assignee:
              carnold@apache.org Curt Arnold
              Reporter:
              carnold@apache.org Curt Arnold
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: