Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-4275

TrieTokenizer causes StringIOOBE when input is empty instead of returning no token

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.0
    • 4.1, 6.0
    • None
    • None

    Description

      When you use the admin interface and select a trie field (e.g. tint) and enter nothing into the field, the tokenizer should normally return no tokens. TrieTokenizer instead gets and SIOOBE because read() into the charbuffer returns -1 (end of stream). This is used to initialize the string's length...

      The problem is mostly affecting the analysis request handler and query parsing, but while indexing the values, Solr uses NumericField and not the tokenizer directly. The solr admin UI has the additional problem that you get a strange exception if you fill in the number on the left, but leave the query (right empty).

      The fix is to modify the tokenizer to behave like a real tokenizer:

      • correct the read loop to look like the one from KeywordTokenizer. The current loop is not guaranteed to work with unbuffered readers (Solr always uses StringReaders so this is no issue, but who knows)
      • if the resulting string is empty (total len == 0), set a boolean to false and make the incrementToken/close/end methods not delegate and return false.

      Attachments

        1. SOLR-4275.patch
          3 kB
          Uwe Schindler

        Activity

          People

            uschindler Uwe Schindler
            uschindler Uwe Schindler
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: