XMLLayout and HTMLLayout assume that the encoding of any associated writer is either UTF-8 or UTF-16. If an encoding is not explicitly specified in the appender, the default platform encoding will be used which is highly unlikely to be UTF-8 or UTF-16 on Windows. A mismatch in encoding will result in non-wellformed XML documents if a non-US-ASCII character is emitted in the log. The proposed resolution is to add a new interface interface EncodingSensitiveLayout { /* @return encoding selected by layout */ String setEncoding(final String proposedEncoding); } to be implemented by XMLLayout and HTMLLayout. In the WriterAppender.activateOptions, if the layout supported EncodingSensitiveLayout, it would be passed the proposed encoding and would have a chance to either modify its behavior to be consistent with that encoding or to override the choice of encoding.
Added a notice to the javadoc for XMLLayout and HTMLLayout to use UTF-8 or UTF-16 encoding or risk corrupted documents (for log4j 1.2). Will not fix in log4j 1.3. log4j 2.0 will have a distinct support for byte (as opposed to character) layouts, so should not be a problem for it. The proposed solution was an attempt to work-around the lack of a direct byte layout mechanism.