Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8347

BlendedInfixSuggester to handle multi term matches better

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 7.3.1
    • None
    • core/search
    • None
    • New, Patch Available

    Description

      Currently the blendedInfix suggester considers just the first match position when scoring a suggestion.
      From the lucene-dev mailing list :
      "
      If I write more than one term in the query, let's say 
       
      "Mini Bar Fridge" 
       
      I would expect in the results something like (note that allTermsRequired=true and the schema weight field always returns 1000)
       

      • Mini Bar Fridge something
      • Mini Bar Fridge something else
      • Mini Bar something Fridge        
      • Mini Bar something else Fridge
      • Mini something Bar Fridge
        ...
         
        Instead I see this: 
         
        Mini Bar something Fridge        
        Mini Bar something else Fridge
        Mini Bar Fridge something
        Mini Bar Fridge something else
        Mini something Bar Fridge
        ...
         
        After having a look at the suggester code (BlendedInfixSuggester.createCoefficient), I see that the component takes in account only one position, which is the lowest position (among the three matching terms) within the term vector ("mini" in the example above) so all the suggestions above have the same weight 
        "
        Scope of this Jira issue is to improve the BlendedInfix to better manage those scenarios.

      Attachments

        1. LUCENE-8347.patch
          23 kB
          Alessandro Benedetti
        2. LUCENE-8347.patch
          23 kB
          Alessandro Benedetti

        Issue Links

          Activity

            People

              Unassigned Unassigned
              abenedetti Alessandro Benedetti
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m