Solr
  1. Solr
  2. SOLR-7304

Spellcheck.collate Sometimes Invalidates Range Queries

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.9
    • Fix Version/s: 4.9, 5.5
    • Component/s: spellchecker
    • Environment:

      Jetty
      Debian

      Description

      I have an error with SpellCheckComponent since I have added this SearchComponent to /select RequestHandler (see solrconfig.xml).

      <requestHandler name="/select" class="solr.SearchHandler">
      <!-- default values for query parameters can be specified, these
      will be overridden by parameters in the request
      -->
      <lst name="defaults">
      <str name="echoParams">explicit</str>
      <int name="rows">10</int>
      <str name="df">titre</str>

      <!-- h4k1m: configure spellcheck if enabled -->
      <str name="spellcheck">on</str>
      <str name="spellcheck.dictionary">default</str>
      <str name="spellcheck.extendedResults">true</str>
      <str name="spellcheck.count">3</str>
      <str name="spellcheck.alternativeTermCount">3</str>
      <str name="spellcheck.maxResultsForSuggest">5</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">true</str>
      <str name="spellcheck.maxCollationTries">10</str>
      <str name="spellcheck.maxCollations">1</str>
      <str name="spellcheck.onlyMorePopular">false</str>
      <str name="combineWords">false</str>
      </lst>

      The error seems to be related to range queries, with the [.. to ..] written in lowercase. The query performed by the SpellCheck component using 'to' in lower case throws the RANGE_GOOP error.

      101615 [qtp2145626092-38] WARN org.apache.solr.spelling.SpellCheckCollator - Exception trying to re-query to check if a spell check possibility would return any hits.
      org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Cannot parse 'offredemande:offre AND categorieparente:"audi" AND prix:[2000016 to 2250008} AND anneemodele:[2003 to 2008} AND etat:"nauf"': Encountered " <RANGE_GOOP> "2250008 "" at line 1, column 68.
      Was expecting one of:
      "]" ...
      "}" ...

      at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:205)
      at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:141)
      at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:230)
      at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:197)
      at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
      at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
      at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962)
      at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
      at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
      at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
      at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1645)
      at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:564)
      at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
      at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
      at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
      at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
      at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:498)
      at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
      at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
      at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
      at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:199)
      at org.eclipse.jetty.server.handler.IPAccessHandler.handle(IPAccessHandler.java:220)
      at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
      at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:98)
      at org.eclipse.jetty.server.Server.handle(Server.java:461)
      at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:284)
      at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
      at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
      at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
      at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.solr.search.SyntaxError: Cannot parse 'offredemande:offre AND categorieparente:"audi" AND prix:[2000016 to 2250008} AND anneemodele:[2003 to 2008} AND etat:"nauf"': Encountered " <RANGE_GOOP> "2250008 "" at line 1, column 68.
      Was expecting one of:
      "]" ...
      "}" ...

      at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:156)
      at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
      at org.apache.solr.search.QParser.getQuery(QParser.java:141)
      at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:148)
      ... 30 more
      Caused by: org.apache.solr.parser.ParseException: Encountered " <RANGE_GOOP> "2250008 "" at line 1, column 68.
      Was expecting one of:
      "]" ...
      "}" ...

      at org.apache.solr.parser.QueryParser.generateParseException(QueryParser.java:649)
      at org.apache.solr.parser.QueryParser.jj_consume_token(QueryParser.java:531)
      at org.apache.solr.parser.QueryParser.Term(QueryParser.java:358)
      at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:185)
      at org.apache.solr.parser.QueryParser.Query(QueryParser.java:139)
      at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:96)
      at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:152)
      ... 33 more

      1. SOLR-7304.patch
        4 kB
        James Dyer
      2. SOLR-7304.patch
        2 kB
        James Dyer

        Activity

        Hide
        Hakim added a comment -

        It seems that spellcheck.collate is causing this error, because when I set it to false the error disappeared from log files.

        Show
        Hakim added a comment - It seems that spellcheck.collate is causing this error, because when I set it to false the error disappeared from log files.
        Hide
        James Dyer added a comment -

        Can you post the original query request here? Are you saying the collator is making the word "to" lowercase so that when it tries to test the collation, the query is invalid? Or did the original query have invalid range syntax as well? Also, it looks like the closing bracket in incorrect? I'm seeing a } instead of a ] . Did the original query have incorrect brackets as well or did this get introduced by the collator? (or did jira corrupt this?)

        Show
        James Dyer added a comment - Can you post the original query request here? Are you saying the collator is making the word "to" lowercase so that when it tries to test the collation, the query is invalid? Or did the original query have invalid range syntax as well? Also, it looks like the closing bracket in incorrect? I'm seeing a } instead of a ] . Did the original query have incorrect brackets as well or did this get introduced by the collator? (or did jira corrupt this?)
        Hide
        Hakim added a comment -

        Actually, the collator is making all the query lowercase.
        No, the original query work well (the range work well too, Square brackets (in Lucene) are supposed to represent an inclusive range and Curly brackets represent an exclusive range).

        I bypassed this error by setting this property in the spellcheck searchComponent <str name="queryAnalyzerFieldType">text_suggest</str>. Note that text_suggest is a custom field type I created which is a copy of text_general without the solr.LowerCaseFilterFactory filter in the query analyzer:

        <fieldType name="text_suggest" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <!-- h4k1m: no lower case on query (fix bug with spellcheck queries) -->
        <!-- <filter class="solr.LowerCaseFilterFactory"/> -->
        </analyzer>
        </fieldType>

        I think that's what caused the spellcheck query to be lowercased.

        Show
        Hakim added a comment - Actually, the collator is making all the query lowercase. No, the original query work well (the range work well too, Square brackets (in Lucene) are supposed to represent an inclusive range and Curly brackets represent an exclusive range). I bypassed this error by setting this property in the spellcheck searchComponent <str name="queryAnalyzerFieldType">text_suggest</str>. Note that text_suggest is a custom field type I created which is a copy of text_general without the solr.LowerCaseFilterFactory filter in the query analyzer: <fieldType name="text_suggest" class="solr.TextField" positionIncrementGap="100"> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <!-- h4k1m: no lower case on query (fix bug with spellcheck queries) --> <!-- <filter class="solr.LowerCaseFilterFactory"/> --> </analyzer> </fieldType> I think that's what caused the spellcheck query to be lowercased.
        Hide
        Paul added a comment -

        I'm getting the same problem on version 5.0

        Show
        Paul added a comment - I'm getting the same problem on version 5.0
        Hide
        James Dyer added a comment - - edited

        Attached is a patch with a failing unit test. To reproduce this issue we use "spellcheck.alternativeTermCount" while having the word "to" in the index. We also use a "queryAnalyzerFieldType" that performs lowercasing.

        The test case queries:

        id:[1 TO 10] AND lowerfilt:lovw

        And expects back:

        id:[1 TO 10] AND lowerfilt:love

        But instead gets:

        id:[1 to 10] AND lowerfilt:love

        Both "to" and "and" are in the index. However, SpellingQueryConverter treats the boolean AND/OR/NOT operators special. I think the easiest fix here is to have S.Q.C. also treat "TO" special, at least in cases where it occurs somewhat after [ or

        { and somewhat before ] or }

        .

        Show
        James Dyer added a comment - - edited Attached is a patch with a failing unit test. To reproduce this issue we use "spellcheck.alternativeTermCount" while having the word "to" in the index. We also use a "queryAnalyzerFieldType" that performs lowercasing. The test case queries: id: [1 TO 10] AND lowerfilt:lovw And expects back: id: [1 TO 10] AND lowerfilt:love But instead gets: id: [1 to 10] AND lowerfilt:love Both "to" and "and" are in the index. However, SpellingQueryConverter treats the boolean AND/OR/NOT operators special. I think the easiest fix here is to have S.Q.C. also treat "TO" special, at least in cases where it occurs somewhat after [ or { and somewhat before ] or } .
        Hide
        James Dyer added a comment -

        Here is a patch with the fix. I will commit this next week if everything checks out ok.

        Show
        James Dyer added a comment - Here is a patch with the fix. I will commit this next week if everything checks out ok.
        Hide
        ASF subversion and git services added a comment -

        Commit 1718415 from jdyer@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1718415 ]

        SOLR-7304: SpellingQueryConverter to ignore "TO" as a possible range query operator

        Show
        ASF subversion and git services added a comment - Commit 1718415 from jdyer@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1718415 ] SOLR-7304 : SpellingQueryConverter to ignore "TO" as a possible range query operator
        Hide
        ASF subversion and git services added a comment -

        Commit 1718416 from jdyer@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1718416 ]

        SOLR-7304: SpellingQueryConverter to ignore "TO" as a possible range query operator

        Show
        ASF subversion and git services added a comment - Commit 1718416 from jdyer@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1718416 ] SOLR-7304 : SpellingQueryConverter to ignore "TO" as a possible range query operator
        Hide
        James Dyer added a comment -

        Thanks Hakim for reporting this.

        Show
        James Dyer added a comment - Thanks Hakim for reporting this.

          People

          • Assignee:
            James Dyer
            Reporter:
            Hakim
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development