Details
Description
with XmlLayoutSchemaLog4j ,all (as far as I see) of Japanese characters are replaced with '?'
because log4net.Util.Transform.INVALIDCHARS regular expression is not correct.
this issue may be affect in other languages, as Chinese, Korean or like that.
http://issues.apache.org/jira/browse/LOG4NET-22 says that permitted chars are
#x9 | #xA | #xD | x20-#xD7FF | xE000-#xFFFD | x10000-#x10FFFF
, but regex for invalid characters are
private static Regex INVALIDCHARS=new Regex(@"[^\x09\x0A\x0D\x20-\xFF\u00FF-\u07FF\uE000-\uFFFD]",RegexOptions.Compiled);
so 0x0800 ~ 0xD7FF are mistreated as invalid character.
and 0xD800 ~ 0xDFFF sould also be permitted because these characters are used to express 0x10000 ~ 0x10FFFF in UTF-16
(0xD800 ~ 0xDFFF in unicode are invalid, but in UTF-16 they are ok)
so regex INVALIDCHARS shold be "[^\x09\x0A\x0D\x20-\u00FF\uFFFD]"
(above code is NOT TESTED)