Log4j 2
  1. Log4j 2
  2. LOG4J2-263

RFC5424 Layout (and Syslog Layout) uses platform encoding when no charset is specified in configuration

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0-beta6
    • Fix Version/s: 2.0-beta7
    • Component/s: Appenders, Core
    • Labels:
      None

      Description

      RFC5424 seems to require UTF-8 or US-ASCII as the character encoding.
      Should the 'charset' attribute be removed so that this is no longer configurable?

      Is there any specification for the Syslog format?

      See also discussion here: https://issues.apache.org/jira/browse/LOG4J2-255?focusedCommentId=13659559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13659559

        Activity

        Hide
        Gary Gregory added a comment - - edited

        In this spec https://tools.ietf.org/html/rfc5424, I read:

        The character set used in STRUCTURED-DATA MUST be seven-bit ASCII in
        an eight-bit field as described in [RFC5234]. These are the ASCII
        codes as defined in "USA Standard Code for Information Interchange"
        [ANSI.X3-4.1968]. An exception is the PARAM-VALUE field (see
        Section 6.3.3), in which UTF-8 encoding MUST be used.

        But later:

        The character set used in MSG SHOULD be UNICODE, encoded using UTF-8
        as specified in [RFC3629]. If the syslog application cannot encode
        the MSG in Unicode, it MAY use any other encoding.

        But UTF-8 is required to be in the JRE, so should should never have to worry about this path.

        So, we should not allow any charsets to be configured.

        Show
        Gary Gregory added a comment - - edited In this spec https://tools.ietf.org/html/rfc5424 , I read: The character set used in STRUCTURED-DATA MUST be seven-bit ASCII in an eight-bit field as described in [RFC5234] . These are the ASCII codes as defined in "USA Standard Code for Information Interchange" [ANSI.X3-4.1968] . An exception is the PARAM-VALUE field (see Section 6.3.3), in which UTF-8 encoding MUST be used. But later: The character set used in MSG SHOULD be UNICODE, encoded using UTF-8 as specified in [RFC3629] . If the syslog application cannot encode the MSG in Unicode, it MAY use any other encoding. But UTF-8 is required to be in the JRE, so should should never have to worry about this path. So, we should not allow any charsets to be configured.
        Hide
        Remko Popma added a comment -

        I had some trouble parsing the RFC text, that's why I said "seems to require"...
        But it looks like you reached the same conclusion as I did.

        Show
        Remko Popma added a comment - I had some trouble parsing the RFC text, that's why I said "seems to require"... But it looks like you reached the same conclusion as I did.
        Hide
        Gary Gregory added a comment -

        I do not know enough about this appender and layout to know if we have a problem. The RFC show two different encodings used but I am not sure how that corresponds to our code. It sure sounds like we are not doing it right by (1) letting one charset to be configured and (2) using one charset to encode one LogEvent.

        Show
        Gary Gregory added a comment - I do not know enough about this appender and layout to know if we have a problem. The RFC show two different encodings used but I am not sure how that corresponds to our code. It sure sounds like we are not doing it right by (1) letting one charset to be configured and (2) using one charset to encode one LogEvent.
        Hide
        Ralph Goers added a comment -

        RFC5424Layout has been fixed in revision 1487936 to not allow a charset to be specified. SyslogLayout still does as RFC 3164 does not restrict the message to a specific character set.

        Please verify and close.

        Show
        Ralph Goers added a comment - RFC5424Layout has been fixed in revision 1487936 to not allow a charset to be specified. SyslogLayout still does as RFC 3164 does not restrict the message to a specific character set. Please verify and close.

          People

          • Assignee:
            Unassigned
            Reporter:
            Remko Popma
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development