Log4net
  1. Log4net
  2. LOG4NET-370

RemoteSyslogAppender doesn't properly handle newline in log message

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.12
    • Component/s: Appenders
    • Labels:
      None
    • Environment:
      All

      Description

      RemoteSyslogAppender sends log messages containing newline characters to syslog, however the syslog RFC 3164 states that the message portion of the syslog packet can only contain visible ASCII characters and spaces. This causes multiline log messages to render on a single line in at least some instances.

      The solution appears to be sending each line of the log message as a separate syslog packet. I have modified RemoteSyslogAppender to support this behavior and will provide a patch file.

      Here is the relevant portion of the RFC:

      4.1.3 MSG Part of a syslog Packet

      The MSG part will fill the remainder of the syslog packet. This will
      usually contain some additional information of the process that
      generated the message, and then the text of the message. There is no
      ending delimiter to this part. The MSG part of the syslog packet
      MUST contain visible (printing) characters. The code set
      traditionally and most often used has also been seven-bit ASCII in an
      eight-bit field like that used in the PRI and HEADER parts. In this
      code set, the only allowable characters are the ABNF VCHAR values
      (%d33-126) and spaces (SP value %d32). However, no indication of the
      code set used within the MSG is required, nor is it expected. Other
      code sets MAY be used as long as the characters used in the MSG are
      exclusively visible characters and spaces similar to those described
      above. The selection of a code set used in the MSG part SHOULD be
      made with thoughts of the intended receiver. A message containing
      characters in a code set that cannot be viewed or understood by a
      recipient will yield no information of value to an operator or
      administrator looking at it.

      http://www.ietf.org/rfc/rfc3164.txt

      1. RemoteSyslogAppender.cs.revised4.patch
        5 kB
        Jarrod Alexander
      2. RemoteSyslogAppender.cs.revised3.patch
        6 kB
        Jarrod Alexander
      3. RemoteSyslogAppender.cs.revised2.patch
        5 kB
        Jarrod Alexander
      4. RemoteSyslogAppender.cs.revised.patch
        6 kB
        Dominik Psenner
      5. RemoteSyslogAppender.cs.patch
        3 kB
        Jarrod Alexander

        Activity

        Hide
        Dominik Psenner added a comment -

        Hi Jarrod,

        I am not so happy about this yank:

        + string[] lines = message.Split(new string[]

        { Environment.NewLine }

        , StringSplitOptions.RemoveEmptyEntries);

        I believe that we should patch this appender to be RFC compliant, but according to this different variations of linefeed characters like "\r", "\n" or "\r\n" could still make it into the syslog. Do you agree?

        Show
        Dominik Psenner added a comment - Hi Jarrod, I am not so happy about this yank: + string[] lines = message.Split(new string[] { Environment.NewLine } , StringSplitOptions.RemoveEmptyEntries); I believe that we should patch this appender to be RFC compliant, but according to this different variations of linefeed characters like "\r", "\n" or "\r\n" could still make it into the syslog. Do you agree?
        Hide
        Dominik Psenner added a comment -

        This is the patch I come up with. Please feed back your opinion.

        Show
        Dominik Psenner added a comment - This is the patch I come up with. Please feed back your opinion.
        Hide
        Jarrod Alexander added a comment - - edited

        Dominik,

        If the user was logging data generated from within the program (stacktrace for example), the linefeed characters will be equal to Environment.Newline, so there won't be a problem. If the user logs text from some external source, it could contain other linefeed characters, so I agree it is an issue. Obviously for me and for log4net, performance should be paramount. There are various options for achieving the desired result, but what do you think about a fix as simple as:

        string[] lines = message.Split(new char[]

        { '\r', '\n' }

        , StringSplitOptions.RemoveEmptyEntries);

        Do you have a better method in mind? (edit, I posted this before I saw your revised patch)

        Show
        Jarrod Alexander added a comment - - edited Dominik, If the user was logging data generated from within the program (stacktrace for example), the linefeed characters will be equal to Environment.Newline, so there won't be a problem. If the user logs text from some external source, it could contain other linefeed characters, so I agree it is an issue. Obviously for me and for log4net, performance should be paramount. There are various options for achieving the desired result, but what do you think about a fix as simple as: string[] lines = message.Split(new char[] { '\r', '\n' } , StringSplitOptions.RemoveEmptyEntries); Do you have a better method in mind? (edit, I posted this before I saw your revised patch)
        Hide
        Dominik Psenner added a comment - - edited

        Please don't look at the patch as if it was made from a mastermind. I just wrote down my first two thoughts. However, now that we patch this appender, I would like to patch it good. That means I would like to see some fancy cool dark magic that makes sure every message is RFC compliant. At least then we have good arguments when people complain about poor performance and the patch does what this issue is all about.

        Show
        Dominik Psenner added a comment - - edited Please don't look at the patch as if it was made from a mastermind. I just wrote down my first two thoughts. However, now that we patch this appender, I would like to patch it good. That means I would like to see some fancy cool dark magic that makes sure every message is RFC compliant. At least then we have good arguments when people complain about poor performance and the patch does what this issue is all about.
        Hide
        Jarrod Alexander added a comment -

        Hah, agreed.

        After looking your revision, I think my above suggestion might be better. The additional 3 Replace() calls means a total of 4 iterations through the string to handle the newlines. My above suggestion will only need one loop and should handle any combination of '\r' and '\n'. In the case of "\r\n" or "\n\r", there will be an additional empty split entry which will get tossed out by the RemoveEmptyEntries option. I've ran a quick test and it seems to handle all 4 cases. Thoughts?

        I do like that you wrapped the textwriter with a using statement however. In the case of StringWriter, dispose doesn't do anything, but I agree it should still be in a using.

        Show
        Jarrod Alexander added a comment - Hah, agreed. After looking your revision, I think my above suggestion might be better. The additional 3 Replace() calls means a total of 4 iterations through the string to handle the newlines. My above suggestion will only need one loop and should handle any combination of '\r' and '\n'. In the case of "\r\n" or "\n\r", there will be an additional empty split entry which will get tossed out by the RemoveEmptyEntries option. I've ran a quick test and it seems to handle all 4 cases. Thoughts? I do like that you wrapped the textwriter with a using statement however. In the case of StringWriter, dispose doesn't do anything, but I agree it should still be in a using.
        Hide
        Dominik Psenner added a comment -

        Looks good, accepted. Thanks for your efforts!

        Last but not least, I would like to do a second patch that covers everything but "(%d33-126)". Do you have an idea for that too?

        Show
        Dominik Psenner added a comment - Looks good, accepted. Thanks for your efforts! Last but not least, I would like to do a second patch that covers everything but "(%d33-126)". Do you have an idea for that too?
        Hide
        Dominik Psenner added a comment -

        So far fixed with revision: 1461433

        Show
        Dominik Psenner added a comment - So far fixed with revision: 1461433
        Hide
        Jarrod Alexander added a comment -

        I'll see if I can come up with a solution for the (%d33-126) issue too. Hopefully I'll have time to get it together today.

        Show
        Jarrod Alexander added a comment - I'll see if I can come up with a solution for the (%d33-126) issue too. Hopefully I'll have time to get it together today.
        Hide
        Jarrod Alexander added a comment -

        I have uploaded another patch which processes the message to only send valid syslog characters. I rewrote the method to do all of this in one loop over the message, so it should perform better than a string split and a separate processing step for the other characters. I also profiled using a compiled Regex to remove all invalid characters, but it was about 7 times slower on my machine than the solution I am submitting. What do you think?

        Show
        Jarrod Alexander added a comment - I have uploaded another patch which processes the message to only send valid syslog characters. I rewrote the method to do all of this in one loop over the message, so it should perform better than a string split and a separate processing step for the other characters. I also profiled using a compiled Regex to remove all invalid characters, but it was about 7 times slower on my machine than the solution I am submitting. What do you think?
        Hide
        Dominik Psenner added a comment - - edited

        I am interested in the hard facts. Would you please run each version (pre-1461433, 1461433, 1461433+last patch, 1461433+regex patch) of the code several times and post the result? That should give us a clue about the performance impact. I know this is a tedious work, I owe you a beer if we get the chance to meet some day.

        EDIT: you can also post all timings; i will average/min/max/Q.25/Q.75 the samples by myself.

        Show
        Dominik Psenner added a comment - - edited I am interested in the hard facts. Would you please run each version (pre-1461433, 1461433, 1461433+last patch, 1461433+regex patch) of the code several times and post the result? That should give us a clue about the performance impact. I know this is a tedious work, I owe you a beer if we get the chance to meet some day. EDIT: you can also post all timings; i will average/min/max/Q.25/Q.75 the samples by myself.
        Hide
        Jarrod Alexander added a comment -

        I'll see if I can get this together for you tonight or sometime tomorrow. The beer would be great, but in the meantime I'll take my name on the contributor's list http://logging.apache.org/log4net/team-list.html

        Show
        Jarrod Alexander added a comment - I'll see if I can get this together for you tonight or sometime tomorrow. The beer would be great, but in the meantime I'll take my name on the contributor's list http://logging.apache.org/log4net/team-list.html
        Hide
        Jarrod Alexander added a comment -

        To get a measurable result, I implemented each of the 4 methods and executed the complete Append method 10,000 times recorded the number of milliseconds it took to complete on my machine. I swapped in a fake UdpClient to remove network considerations from the result, and tested each of the 4 implementations using the same 100 byte and 1000 byte randomly generated ASCII string. The results are:

        Logging 100 byte random string 10000 times:
        Pre 1461433: 93ms
        1461433: 252ms
        Suggested: 242ms
        Regex: 358ms

        Logging 1000 byte random string 10000 times:
        Pre 1461433: 171ms
        1461433: 1516ms
        Suggested: 1467ms
        Regex: 2708ms

        The suggested timing above is my suggested (last patch) version, with a couple of minor tweaks. It appears to be slightly faster than even the 1461433 version, although any amount of string processing is nearly an order of magnitude slower than the original implementation. Still, this should be plenty fast considering it can log 1KB in ~0.1467ms. If you would like, I can attach my benchmark code.

        Show
        Jarrod Alexander added a comment - To get a measurable result, I implemented each of the 4 methods and executed the complete Append method 10,000 times recorded the number of milliseconds it took to complete on my machine. I swapped in a fake UdpClient to remove network considerations from the result, and tested each of the 4 implementations using the same 100 byte and 1000 byte randomly generated ASCII string. The results are: Logging 100 byte random string 10000 times: Pre 1461433: 93ms 1461433: 252ms Suggested: 242ms Regex: 358ms Logging 1000 byte random string 10000 times: Pre 1461433: 171ms 1461433: 1516ms Suggested: 1467ms Regex: 2708ms The suggested timing above is my suggested (last patch) version, with a couple of minor tweaks. It appears to be slightly faster than even the 1461433 version, although any amount of string processing is nearly an order of magnitude slower than the original implementation. Still, this should be plenty fast considering it can log 1KB in ~0.1467ms. If you would like, I can attach my benchmark code.
        Hide
        Dominik Psenner added a comment -

        I guess that 24bytes would be the message header and thus 1000bytes is the maximum allowed message length (RFC 3164 section 4.1)? Since your suggested patch is not slower than the one we applied already I'll commit it. Anyway, I did not expect that being RFC compliant would come without a price. Thanks Jarrod for your efforts!

        Show
        Dominik Psenner added a comment - I guess that 24bytes would be the message header and thus 1000bytes is the maximum allowed message length (RFC 3164 section 4.1)? Since your suggested patch is not slower than the one we applied already I'll commit it. Anyway, I did not expect that being RFC compliant would come without a price. Thanks Jarrod for your efforts!
        Hide
        Robert Sindal added a comment -

        Hi

        I'm using the RemoteSyslogAppender and would love to have this fix.

        I don't have a build log4net environment setup, so I was wondering when the next release is?

        And thank you both, Dominik and Jarrod

        Show
        Robert Sindal added a comment - Hi I'm using the RemoteSyslogAppender and would love to have this fix. I don't have a build log4net environment setup, so I was wondering when the next release is? And thank you both, Dominik and Jarrod
        Hide
        Dominik Psenner added a comment -

        Thanks for the KUDOS. It will be released when it is done. If you need it any time sooner, build it from source. It is not an impossible task.

        Show
        Dominik Psenner added a comment - Thanks for the KUDOS. It will be released when it is done. If you need it any time sooner, build it from source. It is not an impossible task.
        Hide
        Dominik Psenner added a comment - - edited

        Fixed with revision: 1467176

        If you think we should fix the 1000 characters max-length limit too, reopen the issue, otherwise close it.

        Show
        Dominik Psenner added a comment - - edited Fixed with revision: 1467176 If you think we should fix the 1000 characters max-length limit too, reopen the issue, otherwise close it.

          People

          • Assignee:
            Dominik Psenner
            Reporter:
            Jarrod Alexander
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development