Uploaded image for project: 'Log4net'
  1. Log4net
  2. LOG4NET-22

XmlLayout allows output of invalid control characters

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.9
    • Fix Version/s: 1.2.10
    • Component/s: Appenders
    • Labels:
      None

      Description

      XmlLayout allows output of invalid control characters.

      Reported by Mike Blake-Knox with additional comments from Curt Arnold.

      The XmlLayout encodes the character 0x1e as  using the standard XML numeric character reference.

      This character code is in a range which is not allowed to appear in XML 1.0 either as a un-encoded value or as a numeric character reference.

      The valid character ranges are defined here in the XML recommendation:
      http://www.w3.org/TR/REC-xml/#charsets

      They are:

      #x9 | #xA | #xD | x20-#xD7FF | xE000-#xFFFD | x10000-#x10FFFF

      Numeric character references are not able to express characters from outside these ranges.

      The System.Xml.XmlTextWriter does not verify if the unicode character is valid in XML, but it does encode it as a numeric character reference if it cannot be expressed in the output encoding.

      To complicate matters further XML 1.1 does allow further, so called restricted characters, to be included in the output if they are encoded as numeric character references. These ranges are:

      x1-#x8 | xB-#xC | xE-#x1F | x7F-#x84 | x86-#x9F

      See http://www.w3.org/TR/2004/REC-xml11-20040204/#charsets for details.

        Activity

        Hide
        nicko Nicko Cadell added a comment -

        The System.Xml.XmlTextWriter does not know which version XML is being generated. There is no API to configure it one way or the other. The XmlLayout does not generate a full XML document, only a fragment which must be included in a document.

        If the XML output in included in an XML 1.1 document then the numeric character references in the additional ranges allowed by the 1.1 spec will be valid. However this is outside of the scope of log4net to enforce.

        The XmlLayout must be told which XML version is being targeted and must default to 1.0 not to 1.1.

        For invalid characters such as 0x1e there are 3 possible solutions:

        1) Discard the character from the output.

        2) Replace the character with a numeric representation e.g. "0x1E".

        3) Replace the character with an XML element e.g. <char code="30"/>

        Regardless of the output version (1.0 or 1.1) selected one of the above choices will need to be made. XML version 1.1 does not allow a NULL (0x0) character to appear un-encoded or as a numeric character reference, therefore this will need to be represented in some way.

        Note that the invalid characters cannot be included in a CDATA block, however there are issues with some parsers that do allow them there when they should not.

        I favour option 3 above because information is not lost. In options 1 and 2 information is lost. In 2 the encoding is not reversible. With 3 the application reading the data requires additional smarts to pickup on the encoded values in element, but all the original information is preserved. If the app just asks for the text nodes, ignoring the child elements, then they will get back the same result as from 1.

        Show
        nicko Nicko Cadell added a comment - The System.Xml.XmlTextWriter does not know which version XML is being generated. There is no API to configure it one way or the other. The XmlLayout does not generate a full XML document, only a fragment which must be included in a document. If the XML output in included in an XML 1.1 document then the numeric character references in the additional ranges allowed by the 1.1 spec will be valid. However this is outside of the scope of log4net to enforce. The XmlLayout must be told which XML version is being targeted and must default to 1.0 not to 1.1. For invalid characters such as 0x1e there are 3 possible solutions: 1) Discard the character from the output. 2) Replace the character with a numeric representation e.g. "0x1E". 3) Replace the character with an XML element e.g. <char code="30"/> Regardless of the output version (1.0 or 1.1) selected one of the above choices will need to be made. XML version 1.1 does not allow a NULL (0x0) character to appear un-encoded or as a numeric character reference, therefore this will need to be represented in some way. Note that the invalid characters cannot be included in a CDATA block, however there are issues with some parsers that do allow them there when they should not. I favour option 3 above because information is not lost. In options 1 and 2 information is lost. In 2 the encoding is not reversible. With 3 the application reading the data requires additional smarts to pickup on the encoded values in element, but all the original information is preserved. If the app just asks for the text nodes, ignoring the child elements, then they will get back the same result as from 1.
        Hide
        niall Niall Daley added a comment -

        By default characters that can not be specified in XML will now be masked by a ?. This can be changed by setting InvalidCharReplacement to a different string. Alternatively set Base64EncodeMessage or Base64EncodeProperties to true, as appropriate, to Base64 encode the data. This allows all values to be output safely.

        Show
        niall Niall Daley added a comment - By default characters that can not be specified in XML will now be masked by a ?. This can be changed by setting InvalidCharReplacement to a different string. Alternatively set Base64EncodeMessage or Base64EncodeProperties to true, as appropriate, to Base64 encode the data. This allows all values to be output safely.

          People

          • Assignee:
            niall Niall Daley
            Reporter:
            nicko Nicko Cadell
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development