Flume
  1. Flume
  2. FLUME-1666

Syslog source strips timestamp and hostname from log message body

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: v1.2.0, v1.3.0
    • Fix Version/s: v1.5.0
    • Component/s: Sinks+Sources
    • Labels:
      None
    • Environment:

      This occurs with Flume all the way up through 1.3.0.

      Description

      The syslog source parses incoming syslog messages. In the process, it strips the timestamp and hostname from each log message, and places them as Event headers.

      Thus, a syslog message that would normally look like so (when written via rsyslog or syslogd):

      Wed Oct 24 09:18:01 UTC 2012 someserver /USR/SBIN/CRON[26981]: (root) CMD (/usr/local/sbin/somescript)
      

      Appears in flume output as:

      /USR/SBIN/CRON[26981]: (root) CMD (/usr/local/sbin/somescript)
      
      1. FLUME-1666-SyslogTextSerializer.patch
        4 kB
        Josh West
      2. FLUME-1666-4.patch
        14 kB
        Jeff Lord
      3. FLUME-1666-3.patch
        14 kB
        Jeff Lord
      4. FLUME-1666-2.patch
        14 kB
        Jeff Lord
      5. FLUME-1666-1.patch
        13 kB
        Jeff Lord

        Issue Links

          Activity

          Hide
          Josh West added a comment -

          There is more than one way to solve this issue. One way is to patch the Syslog source to no longer strip log message timestamp and hostname from each log message. The other way is to write a serializer which adds this data back into the log message.

          As I'm new to Java and didn't want to muck with how the Syslog source currently works for existing users, I chose the path of a serializer. I'm attaching a patch which provides a SyslogTextSerializer, based on the BodyTextSerializer. It simply adds the timestamp and hostname headers into the log message body, if they exist.

          Show
          Josh West added a comment - There is more than one way to solve this issue. One way is to patch the Syslog source to no longer strip log message timestamp and hostname from each log message. The other way is to write a serializer which adds this data back into the log message. As I'm new to Java and didn't want to muck with how the Syslog source currently works for existing users, I chose the path of a serializer. I'm attaching a patch which provides a SyslogTextSerializer, based on the BodyTextSerializer. It simply adds the timestamp and hostname headers into the log message body, if they exist.
          Hide
          Brock Noland added a comment -

          Looks pretty good. Should we be using Charsets._UTF8 in the getBytes() calls?

          I am not sure, but I wonder if we could name is something like HostTimestampTextSerializer since it's not syslog specific? Not 100% sure I like that name but something generic could be good.

          Brock

          Show
          Brock Noland added a comment - Looks pretty good. Should we be using Charsets._UTF8 in the getBytes() calls? I am not sure, but I wonder if we could name is something like HostTimestampTextSerializer since it's not syslog specific? Not 100% sure I like that name but something generic could be good. Brock
          Hide
          Mike Percy added a comment -

          Why don't we make stripping hostname and timestamp optional in the syslog sources? I think that's a better solution.

          The problem with the solution here is that syslog source supports several different timestamp input formats, so in order to be comprehensive we would have to support them all. Better to just add a flag to the syslog sources to use the whole line as the Event body instead of just the parsed "content" portion.

          Show
          Mike Percy added a comment - Why don't we make stripping hostname and timestamp optional in the syslog sources? I think that's a better solution. The problem with the solution here is that syslog source supports several different timestamp input formats, so in order to be comprehensive we would have to support them all. Better to just add a flag to the syslog sources to use the whole line as the Event body instead of just the parsed "content" portion.
          Hide
          Jeff Lord added a comment -

          +1 for an optional setting to strip hostname and timestamp or not.

          Show
          Jeff Lord added a comment - +1 for an optional setting to strip hostname and timestamp or not.
          Hide
          Jeff Lord added a comment -

          Attaching a patch which introduces a boolean keepFields which defaults to false. When set to true this will preserve the timestamp and hostname in the body of the event. Additionally I have added a test for SyslogTcpSource

          Show
          Jeff Lord added a comment - Attaching a patch which introduces a boolean keepFields which defaults to false. When set to true this will preserve the timestamp and hostname in the body of the event. Additionally I have added a test for SyslogTcpSource
          Hide
          Jeff Lord added a comment -

          Syslog source strips timestamp and hostname from log message body

          Show
          Jeff Lord added a comment - Syslog source strips timestamp and hostname from log message body
          Hide
          Jeff Lord added a comment -

          Review Feedback Incorporated.
          New Patch Attached.
          Thanks for the review Mike!

          Show
          Jeff Lord added a comment - Review Feedback Incorporated. New Patch Attached. Thanks for the review Mike!
          Hide
          Mike Percy added a comment -

          +1

          Show
          Mike Percy added a comment - +1
          Hide
          Mike Percy added a comment -

          Committed to trunk and flume-1.5 branches. Thanks for the patch Jeff!

          Show
          Mike Percy added a comment - Committed to trunk and flume-1.5 branches. Thanks for the patch Jeff!
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in flume-trunk #511 (See https://builds.apache.org/job/flume-trunk/511/)
          FLUME-1666. Syslog source strips timestamp and hostname from log message body (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=1f95219ea6f87173018bde126a3485575a8ee252)

          • flume-ng-core/src/main/java/org/apache/flume/source/SyslogTcpSource.java
          • flume-ng-core/src/test/java/org/apache/flume/source/TestSyslogUtils.java
          • flume-ng-core/src/main/java/org/apache/flume/source/SyslogUtils.java
          • flume-ng-doc/sphinx/FlumeDeveloperGuide.rst
          • flume-ng-core/src/main/java/org/apache/flume/source/SyslogSourceConfigurationConstants.java
          • flume-ng-doc/sphinx/FlumeUserGuide.rst
          • flume-ng-core/src/test/java/org/apache/flume/source/TestSyslogUdpSource.java
          Show
          Hudson added a comment - SUCCESS: Integrated in flume-trunk #511 (See https://builds.apache.org/job/flume-trunk/511/ ) FLUME-1666 . Syslog source strips timestamp and hostname from log message body (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=1f95219ea6f87173018bde126a3485575a8ee252 ) flume-ng-core/src/main/java/org/apache/flume/source/SyslogTcpSource.java flume-ng-core/src/test/java/org/apache/flume/source/TestSyslogUtils.java flume-ng-core/src/main/java/org/apache/flume/source/SyslogUtils.java flume-ng-doc/sphinx/FlumeDeveloperGuide.rst flume-ng-core/src/main/java/org/apache/flume/source/SyslogSourceConfigurationConstants.java flume-ng-doc/sphinx/FlumeUserGuide.rst flume-ng-core/src/test/java/org/apache/flume/source/TestSyslogUdpSource.java
          Hide
          Mike Peterson added a comment - - edited

          I'm not very familiar with Java, but it looks like the old patch (from 24/Oct/12 05:36) drops some of the information from the date header when it parses the header with the java Date() function. Most notably I believe it drops milliseconds.

          Is this really the issue or am I looking at something wrong? If so, has this been fixed with the new patch? i.e. does all the information that goes into the header get added back to the message body and nothing is dropped?

          Edit: It also looks like it gives the wrong Time Zone information. Here's an example of a syslog source coming in that I listened to via netcat....
          <166>2013-10-10T13:27:11.935Z
          Here's a timestamp from flume syslog source that came in a little earlier
          Wed Oct 09 13:33:22 EDT 2013
          Note the millisecond (935) has been dropped and it's been read as EDT instead of UTC (Z) time.

          Show
          Mike Peterson added a comment - - edited I'm not very familiar with Java, but it looks like the old patch (from 24/Oct/12 05:36) drops some of the information from the date header when it parses the header with the java Date() function. Most notably I believe it drops milliseconds. Is this really the issue or am I looking at something wrong? If so, has this been fixed with the new patch? i.e. does all the information that goes into the header get added back to the message body and nothing is dropped? Edit: It also looks like it gives the wrong Time Zone information. Here's an example of a syslog source coming in that I listened to via netcat.... <166>2013-10-10T13:27:11.935Z Here's a timestamp from flume syslog source that came in a little earlier Wed Oct 09 13:33:22 EDT 2013 Note the millisecond (935) has been dropped and it's been read as EDT instead of UTC (Z) time.
          Hide
          Mike Percy added a comment -

          The latest patch doesn't modify the original message at all.

          Show
          Mike Percy added a comment - The latest patch doesn't modify the original message at all.
          Hide
          Mike Percy added a comment -

          Oops, I just noticed I missed a file when committing this - the new syslog tcp source test. Committing that now.

          Show
          Mike Percy added a comment - Oops, I just noticed I missed a file when committing this - the new syslog tcp source test. Committing that now.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in flume-trunk #515 (See https://builds.apache.org/job/flume-trunk/515/)
          FLUME-1666. Oops, forgot new test in previous commit (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=730c822c8fd3c393558ee63b48c82bb5a0763266)

          • flume-ng-core/src/test/java/org/apache/flume/source/TestSyslogTcpSource.java
          Show
          Hudson added a comment - SUCCESS: Integrated in flume-trunk #515 (See https://builds.apache.org/job/flume-trunk/515/ ) FLUME-1666 . Oops, forgot new test in previous commit (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=730c822c8fd3c393558ee63b48c82bb5a0763266 ) flume-ng-core/src/test/java/org/apache/flume/source/TestSyslogTcpSource.java

            People

            • Assignee:
              Jeff Lord
              Reporter:
              Josh West
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development