Solr
  1. Solr
  2. SOLR-4275

TrieTokenizer causes StringIOOBE when input is empty instead of returning no token

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.1, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      When you use the admin interface and select a trie field (e.g. tint) and enter nothing into the field, the tokenizer should normally return no tokens. TrieTokenizer instead gets and SIOOBE because read() into the charbuffer returns -1 (end of stream). This is used to initialize the string's length...

      The problem is mostly affecting the analysis request handler and query parsing, but while indexing the values, Solr uses NumericField and not the tokenizer directly. The solr admin UI has the additional problem that you get a strange exception if you fill in the number on the left, but leave the query (right empty).

      The fix is to modify the tokenizer to behave like a real tokenizer:

      • correct the read loop to look like the one from KeywordTokenizer. The current loop is not guaranteed to work with unbuffered readers (Solr always uses StringReaders so this is no issue, but who knows)
      • if the resulting string is empty (total len == 0), set a boolean to false and make the incrementToken/close/end methods not delegate and return false.
      1. SOLR-4275.patch
        3 kB
        Uwe Schindler

        Activity

        Hide
        Uwe Schindler added a comment -

        Simle patch. It also corrects the read logic so it also works with unbuffered input readers.

        Show
        Uwe Schindler added a comment - Simle patch. It also corrects the read logic so it also works with unbuffered input readers.
        Hide
        Uwe Schindler added a comment -

        Committed to 4.x and trunk.

        Show
        Uwe Schindler added a comment - Committed to 4.x and trunk.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Uwe Schindler
        http://svn.apache.org/viewvc?view=revision&revision=1429401

        SOLR-4275: Fix TrieTokenizer to no longer throw StringIndexOutOfBoundsException in admin UI / AnalysisRequestHandler when you enter no number to tokenize

        Show
        Commit Tag Bot added a comment - [trunk commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1429401 SOLR-4275 : Fix TrieTokenizer to no longer throw StringIndexOutOfBoundsException in admin UI / AnalysisRequestHandler when you enter no number to tokenize
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Uwe Schindler
        http://svn.apache.org/viewvc?view=revision&revision=1429402

        Merged revision(s) 1429401 from lucene/dev/trunk:
        SOLR-4275: Fix TrieTokenizer to no longer throw StringIndexOutOfBoundsException in admin UI / AnalysisRequestHandler when you enter no number to tokenize

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1429402 Merged revision(s) 1429401 from lucene/dev/trunk: SOLR-4275 : Fix TrieTokenizer to no longer throw StringIndexOutOfBoundsException in admin UI / AnalysisRequestHandler when you enter no number to tokenize
        Hide
        Commit Tag Bot added a comment -
        Show
        Commit Tag Bot added a comment - [trunk commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1429410 SOLR-4275 : Add test
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Uwe Schindler
        http://svn.apache.org/viewvc?view=revision&revision=1429411

        Merged revision(s) 1429410 from lucene/dev/trunk:
        SOLR-4275: Add test

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1429411 Merged revision(s) 1429410 from lucene/dev/trunk: SOLR-4275 : Add test
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1429417

        SOLR-4275: unbreak the build

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1429417 SOLR-4275 : unbreak the build
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Uwe Schindler
        http://svn.apache.org/viewvc?view=revision&revision=1429419

        SOLR-4275: Fix test. Sorry, the Solr build system did not recognize the test change without ant clean!?

        Show
        Commit Tag Bot added a comment - [trunk commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1429419 SOLR-4275 : Fix test. Sorry, the Solr build system did not recognize the test change without ant clean!?
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Uwe Schindler
        http://svn.apache.org/viewvc?view=revision&revision=1429420

        Merged revision(s) 1429419 from lucene/dev/trunk:
        SOLR-4275: Fix test. Sorry, the Solr build system did not recognize the test change without ant clean!?

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1429420 Merged revision(s) 1429419 from lucene/dev/trunk: SOLR-4275 : Fix test. Sorry, the Solr build system did not recognize the test change without ant clean!?

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development