Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-5324

Implement syslog record readers

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • None
    • None
    • None
    • None

    Description

      Creating this Jira based on discussion with ottobackwards in the NiFi HipChat room...

      We currently have ListenSyslog with optional parsing when batch size is 1, and ParseSyslog which also assumes 1 message per flow file. There is also ListenTCPRecord and ListenUDPRecord which can be used with a GrokReader to read log messages from the respective network connections.

      The common scenario for wanting to parse the syslog messages is to extract a field from the syslog message into an attribute and then use the attribute to make decisions like routing/filtering.

      Since the "1 message per flow file" pattern is generally something we try to avoid, it would be nice if we could keep batches of syslog messages together in a single flow file and then use record processors to process the batches.

      For example, if we had a syslog record reader we could then use PartitionRecord to divide a flow file of many syslog records into smaller groups based on some field in the message, each group can then be routed somewhere based on the group value.

      Another example would be to use QueryRecord to run a SQL query that selects specify syslog messages based on a field in the message.

      It would also make it easy to convert syslog messages to a structured format using ConvertRecord with a syslog reader and a writer like JSON or Avro.

      We would likely want two syslog record readers, one for each of the RFC formats.

      One aspect to consider is related to the schema used/produced by the reader... typically the readers/writers have a "Schema Access Strategy" where they can obtain a schema from a schema registry, or from flow file attributes, or something specific to the format like an embedded Avro schema.

      In this case, the schema is somewhat pre-determined by the specific syslog reader because the schema can only be at-most the fields produced by the reader parsing the messages. So this may be a case where there is no schema access strategy, and there are per-determined schemas.  It is sort of like the GrokReader where it creates a schema from the named fields in the expression, except in this case there is no user defined expression, and the named fields are dictated by the parser.

      We may need to reuse syslog related code that is in nifi-standard-processors, so it might require moving that code to nifi-processor-utils, or creating a new nifi-syslog-utils module.

       

       

      Attachments

        Issue Links

          Activity

            People

              otto Otto Fowler
              bbende Bryan Bende
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: