Solr
  1. Solr
  2. SOLR-1400

Document with empty or white-space only string causes exception with TrimFilter

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: update
    • Labels:
      None

      Description

      Observed with Solr trunk. Posting any empty or whitespace-only string to a field using the

      <filter class="solr.TrimFilterFactory" />

      Causes a java exception:

      Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log
      SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1
      	at org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63)
      	at org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74)
      	at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138)
      	at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
      	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
      	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
      	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
      	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
      	at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
      	at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
      	at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
      	at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
      	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
      	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
      	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
      	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
      	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
      

      Trim of an empty or WS-only string should not fail.

      1. SOLR-1400.patch
        3 kB
        Grant Ingersoll
      2. trim-example.xml
        2 kB
        Peter Wolanin

        Activity

        Hide
        Grant Ingersoll added a comment -

        Bulk close Solr 1.4 issues

        Show
        Grant Ingersoll added a comment - Bulk close Solr 1.4 issues
        Hide
        Grant Ingersoll added a comment -

        Committed revision 812494.

        Show
        Grant Ingersoll added a comment - Committed revision 812494.
        Hide
        Peter Wolanin added a comment -

        these lines seems to vary as to whether there is WS between "char" and the []

        @@ -29,29 +30,48 @@
         public class TestTrimFilter extends BaseTokenTestCase {
           
           public void testTrim() throws Exception {
        +    char[] a = " a ".toCharArray();
        +    char [] b = "b   ".toCharArray();
        +    char [] ccc = "cCc".toCharArray();
        +    char[] whitespace = "   ".toCharArray();
        +    char[] empty = "".toCharArray();
        
        Show
        Peter Wolanin added a comment - these lines seems to vary as to whether there is WS between "char" and the [] @@ -29,29 +30,48 @@ public class TestTrimFilter extends BaseTokenTestCase { public void testTrim() throws Exception { + char [] a = " a " .toCharArray(); + char [] b = "b " .toCharArray(); + char [] ccc = "cCc" .toCharArray(); + char [] whitespace = " " .toCharArray(); + char [] empty = "".toCharArray();
        Hide
        Grant Ingersoll added a comment -

        Peter, what is the code style inconsistency?

        Show
        Grant Ingersoll added a comment - Peter, what is the code style inconsistency?
        Hide
        Sascha Szott added a comment -

        Grant, why couldn't you readd the checks you introduced in revision #643465

        Token t = input.next(in);
        if (null == t || null == t.termBuffer() || t.termLength() == 0) {
          return t;
        }
        

        and adjust them to the changes in the API, which results in

        if (termAtt == null || termAtt.termBuffer() == null || termAtt.termLength() == 0) {
          return true;
        }
        

        To be honest, I'm not sure which return value suits best here.

        Show
        Sascha Szott added a comment - Grant, why couldn't you readd the checks you introduced in revision #643465 Token t = input.next(in); if (null == t || null == t.termBuffer() || t.termLength() == 0) { return t; } and adjust them to the changes in the API, which results in if (termAtt == null || termAtt.termBuffer() == null || termAtt.termLength() == 0) { return true; } To be honest, I'm not sure which return value suits best here.
        Hide
        Peter Wolanin added a comment -

        The patch seems to fix the bug for me, but there seems to be some code style inconsistency in the test code.

        Show
        Peter Wolanin added a comment - The patch seems to fix the bug for me, but there seems to be some code style inconsistency in the test code.
        Hide
        Yonik Seeley added a comment -

        +1
        Although the logic could probably be simplified such that a zero-length test wouldn't even be needed, this is the simplest fix.

        Show
        Yonik Seeley added a comment - +1 Although the logic could probably be simplified such that a zero-length test wouldn't even be needed, this is the simplest fix.
        Hide
        Grant Ingersoll added a comment -

        Try this out.

        Show
        Grant Ingersoll added a comment - Try this out.
        Hide
        Grant Ingersoll added a comment -

        Hmm, trimFilter has a test for all whitespace

        Show
        Grant Ingersoll added a comment - Hmm, trimFilter has a test for all whitespace
        Hide
        Peter Wolanin added a comment -

        Post the attached document using the trunk sample schema.xml to reproduce.

        Show
        Peter Wolanin added a comment - Post the attached document using the trunk sample schema.xml to reproduce.

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Peter Wolanin
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development