Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-1988

Add Support for Additional Deserializers for SpoolingDirectorySource

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4.0
    • None
    • Docs, Sinks+Sources
    • Hide
      org.apache.flume.serialization.RegexDelimiterDeSerializer
      org.apache.flume.serialization.TestRegexDelimiterDeSerializer

      Splits input into event "blocks" based on a configurable regex pattern.
      The pattern is included in the block it is split on, so generally the user will want to split on something that marks the end
      Show
      org.apache.flume.serialization.RegexDelimiterDeSerializer org.apache.flume.serialization.TestRegexDelimiterDeSerializer Splits input into event "blocks" based on a configurable regex pattern. The pattern is included in the block it is split on, so generally the user will want to split on something that marks the end

    Description

      There are certain use cases for SpoolingDirectorySource where the events in the log file are not delimited with newline characters.

      Certain log files that contain stack traces, xml documents and pretty JSON strings seem to contain multiple new line characters within each event.

      We can use alternative logic such as specific characters, strings or regular expressions to determine when the event is complete.

      Hence I am proposing the following new deserializers based on org.apache.flume.serialization.LineDeserializer

      1. org.apache.flume.serialization.RegexDelimiterDeSerializer
        Allows the user to specify a regular expression that is a delimiter for events within the log file
      1. org.apache.flume.serialization.CharSequenceDelimiterDeSerializer
        Allows the user to specify a comma separated character sequence that is a delimiter for events within the log file
        The user will specify an integer for the ascii characters and we will use that as the delimter.
        For example support for \r\n could be specified as 13,10
        A list of codes is available at http://www.asciitable.com/

      We will also need to update the user guide with examples on how to configure and specify a custom deserializer.

      Attachments

        1. TestRegexDelimiterDeSerializer.java
          8 kB
          BitsOfInfo
        2. ResettableTestStringInputStream.java
          2 kB
          BitsOfInfo
        3. RegexDelimiterDeSerializer.java
          7 kB
          BitsOfInfo
        4. EventDeserializerType.java
          1 kB
          BitsOfInfo

        Activity

          People

            iekpo Israel Ekpo
            iekpo Israel Ekpo
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: