Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2768

New ElasticSearch "structured" log behavior is wrong, and dangerous.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.6.0
    • None
    • Sinks+Sources
    • None
    • Important

    Description

      The new behavior introduced in Flume 1.6.0 to automatically treat all JSON log messages as structured data (https://issues.apache.org/jira/browse/FLUME-2649, later fixed in https://issues.apache.org/jira/browse/FLUME-2126) is really dangerous, under documented and not controllable by a configuration switch.

      ElasticSearch Schema Change for the @message field
      The change that was made is pretty dangerous – it assumes that if you're passing any JSON data, you must be only passing JSON data... why? Because as soon as you pass in @message as a Object, ElasticSearch will refuse any future data to the @message field that comes in String format. As soon as this happens, your log events get dropped on the floor.

      Assumes stable field-names and types
      Similar to the first issue, but more likely to bite you later on ... this change assumes that your field names are stable and always contain the same type of data. That is, if you pass in "duration": "5 seconds" then a field in ElasticSearch named duration will be created with the "string" type. Now imagine another app writes a log message with "duration": 5.0" – you're stuck, ElasticSearch cannot index that data and drops it on the floor because it violates the schema.

      Finally ... its an undocumented behavior change
      This is the real big one here – this change is not documented anywhere other than the commit messages. Also, you can't turn it off!. At the very least this new behavior should be optional, controlled by a configuration switch, and disabled by default.

      Lastly ... a fix?
      I plan to release the ElasticSearchLogStashStructuredEventSerializer that we use here at Nextdoor that handles all of the above issues silently. It never touches the @message field and it automatically handles all structured log data by dynamically renaming fields to include __<field type> in their name.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              diranged Matt Wise
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: