Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11662

Make overlapping query term scoring configurable per field type

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 7.2, 8.0
    • None
    • None

    Description

      This patch customizes the query-time behavior when query terms overlap positions. Right now the only option is SynonymQuery. This is a fantastic default & improvement on past versions. However, there are use cases where terms overlap positions but don't carry exact synonymy relationships. Often synonyms are actually used to model hypernym/hyponym relationships using synonyms (or other analyzers). So the individual term scores matter, with terms with higher specificity (hyponym) scoring higher than terms with lower specificity (hypernym).

      This patch adds the fieldType setting scoreOverlaps, as in:

        <fieldType name="text_general"  scoreOverlaps="pick_best"  class="solr.TextField" positionIncrementGap="100" multiValued="true">
      
      

      Valid values for scoreOverlaps are:

      as_one_term
      Default, most synonym use cases. Uses SynonymQuery
      Treats all terms as if they're exactly equivalent, with document frequency from underlying terms blended

      pick_best
      For a given document, score using the best scoring synonym (ie dismax over generated terms).
      Useful when synonyms not exactly equilevant. Instead they are used to model hypernym/hyponym relationships. Such as expanding to synonyms of where terms scores will reflect that quality
      IE this query time expansion

      tabby => tabby, cat, animal

      Searching "text", generates the dismax (text:tabby | text:cat | text:animal)

      as_distinct_terms
      (The pre 6.0 behavior.)
      Compromise between pick_best and as_oneSterm
      Appropriate when synonyms reflect a hypernym/hyponym relationship, but lets scores stack, so documents with more tabby, cat, or animal the better w/ a bias towards the term with highest specificity
      Terms are turned into a boolean OR query, with documen frequencies not blended
      IE this query time expansion

      tabby => tabby, cat, animal

      Searching "text", generates the boolean query (text:tabby text:cat text:animal)

      Attachments

        Issue Links

          Activity

            People

              dsmiley David Smiley
              softwaredoug Doug Turnbull
              Votes:
              4 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m