Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-9203

Improve GrokReader to be able to handle complex Grok expressions

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.15.0
    • None
    • None

    Description

      The current GrokReader implementation cannot handle complex expressions like in the following scenario:

      Suppose we have a custom Grok pattern file:

      SYSLOGBASE_ISO8601 %{TIMESTAMP_ISO8601:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
      LINE_1 %{SYSLOGBASE}%{GREEDYDATA:message}
      LINE_2 %{SYSLOGBASE_ISO8601}%{GREEDYDATA:message}
      LINE (?:%{LINE_1}|%{LINE_2})
      

      If we set the Grok expression to:

      %LINE
      

      the service will fail for 2 reasons:

      1. LINE_1 and LINE_2 define the same labels. The service will try to create a schema by adding fields for all labels encountered. This leads to duplicate fields in the schema which is not allowed.
      2. When the used Grok library reads a record based on a complex expression it returns an array as a value as the complex expression can have multiple matches. NiFi in turn tries to handle it as a byte array.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tpalfy Tamas Palfy
            tpalfy Tamas Palfy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                Slack

                  Issue deployment