Issue Details (XML | Word | Printable)

Key: LOG4NET-22
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Niall Daley
Reporter: Nicko Cadell
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Log4net

XmlLayout allows output of invalid control characters

Created: 11/Apr/05 03:33 AM   Updated: 24/Aug/05 10:31 PM
Return to search
Component/s: Appenders
Affects Version/s: 1.2.9
Fix Version/s: 1.2.10

Time Tracking:
Not Specified

Resolution Date: 24/Aug/05 10:31 PM


 Description  « Hide
XmlLayout allows output of invalid control characters.

Reported by Mike Blake-Knox with additional comments from Curt Arnold.


The XmlLayout encodes the character 0x1e as  using the standard XML numeric character reference.

This character code is in a range which is not allowed to appear in XML 1.0 either as a un-encoded value or as a numeric character reference.

The valid character ranges are defined here in the XML recommendation:
http://www.w3.org/TR/REC-xml/#charsets

They are:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Numeric character references are not able to express characters from outside these ranges.

The System.Xml.XmlTextWriter does not verify if the unicode character is valid in XML, but it does encode it as a numeric character reference if it cannot be expressed in the output encoding.

To complicate matters further XML 1.1 does allow further, so called restricted characters, to be included in the output if they are encoded as numeric character references. These ranges are:

[#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]

See http://www.w3.org/TR/2004/REC-xml11-20040204/#charsets for details.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Nicko Cadell added a comment - 11/Apr/05 04:02 AM
The System.Xml.XmlTextWriter does not know which version XML is being generated. There is no API to configure it one way or the other. The XmlLayout does not generate a full XML document, only a fragment which must be included in a document.

If the XML output in included in an XML 1.1 document then the numeric character references in the additional ranges allowed by the 1.1 spec will be valid. However this is outside of the scope of log4net to enforce.

The XmlLayout must be told which XML version is being targeted and must default to 1.0 not to 1.1.

For invalid characters such as 0x1e there are 3 possible solutions:

1) Discard the character from the output.

2) Replace the character with a numeric representation e.g. "0x1E".

3) Replace the character with an XML element e.g. <char code="30"/>

Regardless of the output version (1.0 or 1.1) selected one of the above choices will need to be made. XML version 1.1 does not allow a NULL (0x0) character to appear un-encoded or as a numeric character reference, therefore this will need to be represented in some way.

Note that the invalid characters cannot be included in a CDATA block, however there are issues with some parsers that do allow them there when they should not.

I favour option 3 above because information is not lost. In options 1 and 2 information is lost. In 2 the encoding is not reversible. With 3 the application reading the data requires additional smarts to pickup on the encoded values in element, but all the original information is preserved. If the app just asks for the text nodes, ignoring the child elements, then they will get back the same result as from 1.

Repository Revision Date User Message
ASF #312309 Wed Aug 24 13:26:38 UTC 2005 niall Fixes for LOG4NET-22 and LOG4NET-44 with associated tests.

Characters that cannot be expressed in XML are now masked with a user specifiable charater.
The message and property values may be base64 encoded if this is undesirable.

The name of the properties node has been fixed to properties rather than global-properties.

PR:
Obtained from:
Submitted by:
Reviewed by:
Files Changed
MODIFY /logging/log4net/trunk/tests/src/log4net.Tests.csproj
MODIFY /logging/log4net/trunk/src/Util/Transform.cs
MODIFY /logging/log4net/trunk/src/Layout/XmlLayoutSchemaLog4j.cs
MODIFY /logging/log4net/trunk/src/Layout/XMLLayoutBase.cs
MODIFY /logging/log4net/trunk/src/Layout/XMLLayout.cs
ADD /logging/log4net/trunk/tests/src/Layout/XmlLayoutTest.cs

Niall Daley added a comment - 24/Aug/05 10:31 PM
By default characters that can not be specified in XML will now be masked by a ?. This can be changed by setting InvalidCharReplacement to a different string. Alternatively set Base64EncodeMessage or Base64EncodeProperties to true, as appropriate, to Base64 encode the data. This allows all values to be output safely.

Niall Daley made changes - 24/Aug/05 10:31 PM
Field Original Value New Value
Assignee Niall Daley [ niall ]
Fix Version/s 1.2.10 [ 11128 ]
Resolution Fixed [ 1 ]
Status Open [ 1 ] Resolved [ 5 ]