Uploaded image for project: 'Log4net'
  1. Log4net
  2. LOG4NET-229

Japanese characters get garbled with log4net.Layout.XmlLayoutSchemaLog4j

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.10
    • 1.2.11
    • Appenders
    • None
    • log4net 1.2.10, .net 2.0

    Description

      with XmlLayoutSchemaLog4j ,all (as far as I see) of Japanese characters are replaced with '?'
      because log4net.Util.Transform.INVALIDCHARS regular expression is not correct.
      this issue may be affect in other languages, as Chinese, Korean or like that.

      http://issues.apache.org/jira/browse/LOG4NET-22 says that permitted chars are

      #x9 | #xA | #xD | x20-#xD7FF | xE000-#xFFFD | x10000-#x10FFFF

      , but regex for invalid characters are

      private static Regex INVALIDCHARS=new Regex(@"[^\x09\x0A\x0D\x20-\xFF\u00FF-\u07FF\uE000-\uFFFD]",RegexOptions.Compiled);

      so 0x0800 ~ 0xD7FF are mistreated as invalid character.

      and 0xD800 ~ 0xDFFF sould also be permitted because these characters are used to express 0x10000 ~ 0x10FFFF in UTF-16
      (0xD800 ~ 0xDFFF in unicode are invalid, but in UTF-16 they are ok)

      so regex INVALIDCHARS shold be "[^\x09\x0A\x0D\x20-\u00FF\uFFFD]"
      (above code is NOT TESTED)

      Attachments

        Activity

          People

            Unassigned Unassigned
            atu Atsushi Suzuki
            Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified