Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13190

Fuzzy search treated as server error instead of client error when terms are too complex

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 5.5, 7.7, 8.0
    • 8.5.2, 8.6, 9.0
    • search
    • None

    Description

      We've seen a fuzzy search end up breaking the automaton and getting reported as a server error. This usage should be improved by
      1) reporting as a client error, because it's similar to something like too many boolean clauses queries in how an operator should deal with it
      2) report what field is causing the error, since that currently must be deduced from adjacent query logs and can be difficult if there are multiple terms in the search

      This trigger was added to defend against adversarial regex but somehow hits fuzzy terms as well, I don't understand enough about the automaton mechanisms to really know how to approach a fix there, but improving the operability is a good first step.

      relevant stack trace:

      org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton with 13632 states and 21348 transitions would result in more than 10000 states.
      	at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:746)
      	at org.apache.lucene.util.automaton.RunAutomaton.<init>(RunAutomaton.java:69)
      	at org.apache.lucene.util.automaton.ByteRunAutomaton.<init>(ByteRunAutomaton.java:32)
      	at org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:247)
      	at org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:133)
      	at org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:143)
      	at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154)
      	at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
      	at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
      	at org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
      	at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:310)
      	at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:667)
      	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:442)
      	at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:200)
      	at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1604)
      	at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1420)
      	at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:567)
      	at org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1435)
      	at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:374)
      	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
      	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
      	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2559)
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mdrob Mike Drob
            mdrob Mike Drob
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 40m
                40m

                Slack

                  Issue deployment