Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8347

BlendedInfixSuggester to handle multi term matches better

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 7.3.1
    • None
    • core/search
    • None
    • New, Patch Available

    Description

      Currently the blendedInfix suggester considers just the first match position when scoring a suggestion.
      From the lucene-dev mailing list :
      "
      If I write more than one term in the query, let's say 
       
      "Mini Bar Fridge" 
       
      I would expect in the results something like (note that allTermsRequired=true and the schema weight field always returns 1000)
       

      • Mini Bar Fridge something
      • Mini Bar Fridge something else
      • Mini Bar something Fridge        
      • Mini Bar something else Fridge
      • Mini something Bar Fridge
        ...
         
        Instead I see this: 
         
        Mini Bar something Fridge        
        Mini Bar something else Fridge
        Mini Bar Fridge something
        Mini Bar Fridge something else
        Mini something Bar Fridge
        ...
         
        After having a look at the suggester code (BlendedInfixSuggester.createCoefficient), I see that the component takes in account only one position, which is the lowest position (among the three matching terms) within the term vector ("mini" in the example above) so all the suggestions above have the same weight 
        "
        Scope of this Jira issue is to improve the BlendedInfix to better manage those scenarios.

      Attachments

        1. LUCENE-8347.patch
          23 kB
          Alessandro Benedetti
        2. LUCENE-8347.patch
          23 kB
          Alessandro Benedetti

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            abenedetti Alessandro Benedetti

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m

                Slack

                  Issue deployment