Solr
  1. Solr
  2. SOLR-6085

Suggester crashes when prefixToken is longer than surface form

    Details

      Description

      AnalyzingInfixSuggester class fails when is queried with a ß character (ezsett) used in German, but it doesn't happen for all data or for all words containing this character. The exception reported is the following:

      
      <response>
      <lst name="responseHeader">
      <int name="status">500</int>
      <int name="QTime">18</int>
      </lst>
      <lst name="error">
      <str name="msg">String index out of range: 5</str>
      <str name="trace">
      java.lang.StringIndexOutOfBoundsException: String index out of range: 5 at java.lang.String.substring(String.java:1907) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.addPrefixMatch(AnalyzingInfixSuggester.java:575) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.highlight(AnalyzingInfixSuggester.java:525) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.createResults(AnalyzingInfixSuggester.java:479) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.lookup(AnalyzingInfixSuggester.java:437) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.lookup(AnalyzingInfixSuggester.java:338) at org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:181) at org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:232) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:217) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744)
      </str>
      <int name="code">500</int>
      </lst>
      </response>
      

      With this query

      http://localhost:8983/solr/suggest_de?suggest.q=gieß (for gießen, which is actually in the data)

      The problem seems to be that we use ASCIIFolding to unify ss and ß, which are both valid alternatives in German.

      Looking at the code we found that string limits are not properly checked for the method involved in the exception:

      protected void addPrefixMatch(StringBuilder sb, String surface, String analyzed, String prefixToken) {
          // TODO: apps can try to invert their analysis logic
          // here, e.g. downcase the two before checking prefix:
          sb.append("<b>");
          sb.append(surface.substring(0, prefixToken.length()));
          sb.append("</b>");
          if (prefixToken.length() < surface.length()) {
            sb.append(surface.substring(prefixToken.length()));
          }
        }
      

      For example, when surface is "daß" and prefixToken is "dass", surface.substring will fail.

      A possible solution would be:

      protected void addPrefixMatch(StringBuilder sb, String surface, String analyzed, String prefixToken) {
          // TODO: apps can try to invert their analysis logic
          // here, e.g. downcase the two before checking prefix:
          sb.append("<b>");
          if(prefixToken.length() > surface.length()){
            sb.append(surface);
          }
          else
          {
            sb.append(surface.substring(0, prefixToken.length()));
          }
          sb.append("</b>");
          if (prefixToken.length() < surface.length()) {
            sb.append(surface.substring(prefixToken.length()));
          }
        }
      
      1. SOLR-6085.patch
        3 kB
        Jan Høydahl

        Activity

        Hide
        Jan Høydahl added a comment -

        Hi, is this still an issue?

        Would you be able to write a patch with a failing JUnit test case and your proposed solution and attach to this issue? (http://wiki.apache.org/solr/HowToContribute#Generating_a_patch)

        Show
        Jan Høydahl added a comment - Hi, is this still an issue? Would you be able to write a patch with a failing JUnit test case and your proposed solution and attach to this issue? ( http://wiki.apache.org/solr/HowToContribute#Generating_a_patch )
        Hide
        Jan Høydahl added a comment -

        Attaching patch (trunk) with a testcase that provokes the bug and a simple fix.

        Show
        Jan Høydahl added a comment - Attaching patch (trunk) with a testcase that provokes the bug and a simple fix.
        Hide
        Jan Høydahl added a comment -

        Jorge Ferrández Can you test the patch in your environment?

        Show
        Jan Høydahl added a comment - Jorge Ferrández Can you test the patch in your environment?
        Hide
        Jorge Ferrández added a comment -

        Yes, sure. I will test it next week. I will post the results when I have them.

        Thank you very much for the patch. I apologize because I don't have much time to do it.

        Show
        Jorge Ferrández added a comment - Yes, sure. I will test it next week. I will post the results when I have them. Thank you very much for the patch. I apologize because I don't have much time to do it.
        Hide
        ASF subversion and git services added a comment -

        Commit 1638711 from janhoy@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1638711 ]

        SOLR-6085: Suggester crashes when prefixToken is longer than surface form

        Show
        ASF subversion and git services added a comment - Commit 1638711 from janhoy@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1638711 ] SOLR-6085 : Suggester crashes when prefixToken is longer than surface form
        Hide
        ASF subversion and git services added a comment -

        Commit 1638712 from janhoy@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1638712 ]

        SOLR-6085: Suggester crashes when prefixToken is longer than surface form (merge)

        Show
        ASF subversion and git services added a comment - Commit 1638712 from janhoy@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1638712 ] SOLR-6085 : Suggester crashes when prefixToken is longer than surface form (merge)
        Hide
        ASF subversion and git services added a comment -

        Commit 1638716 from janhoy@apache.org in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1638716 ]

        SOLR-6085: Suggester crashes when prefixToken is longer than surface form (backport)

        Show
        ASF subversion and git services added a comment - Commit 1638716 from janhoy@apache.org in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1638716 ] SOLR-6085 : Suggester crashes when prefixToken is longer than surface form (backport)
        Hide
        Jan Høydahl added a comment -

        Fixed, Jorge Ferrández you may now build the lucene_solr_4_10 branch if you like to test it

        Show
        Jan Høydahl added a comment - Fixed, Jorge Ferrández you may now build the lucene_solr_4_10 branch if you like to test it
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Jan Høydahl
            Reporter:
            Jorge Ferrández
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development