Uploaded image for project: 'Metron (Retired)'
  1. Metron (Retired)
  2. METRON-517

Update elasticsearch bro templates for uri

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Done
    • Major
    • Resolution: Done
    • None
    • 0.3.0
    • None

    Description

      The bro uri field in [HTTP::Info](https://www.bro.org/sphinx/scripts/base/protocols/http/main.bro.html#type-HTTP::Info) can exceed the Lucene-imposed limit of 32766 per term (non-analyzed fields are treated as a single term, and we are setting it as not_analyzed here - https://github.com/apache/incubator-metron/blob/master/metron-deployment/roles/metron_elasticsearch_templates/files/es_templates/bro_index.template). The resolution options that I've been able to find appear to be:
      1. Set analyzed to "[no](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-index.html)", which will not add that field to the index, making it not queryable.
      2. Change the type to [binary](https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html), which will not store it by default.
      3. Use "[ignore_above](https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html)" to set a limit, above which strings are not indexed.
      4. Set the field as "analyzed".

      Here is an example error message:

      ```
      [4]: index [bro_index_2016.10.25.21], type [bro_doc], id [AVf-iCuooLg3mHEm2PpH], message [java.lang.IllegalArgumentException: Document contains at least one immense term in field="uri" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[<redacted>]...', original message: bytes can be at most 32766 in length; got 38623]
      ```

      Relevant Lucene documentation: https://lucene.apache.org/core/6_2_1/core/constant-values.html#org.apache.lucene.index.IndexWriter.MAX_TERM_LENGTH

      Attachments

        Issue Links

          Activity

            People

              jonzeolla Jon Zeolla
              jonzeolla Jon Zeolla
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 1h
                  1h
                  Remaining:
                  Remaining Estimate - 1h
                  1h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified