Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6459

[suggest] Query Interface for suggest API

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 5.1
    • 5.3, 6.0
    • core/search
    • None
    • New

    Description

      This patch factors out common indexing/search API used by the recently introduced NRTSuggester
      The motivation is to provide a query interface for FST-based fields (SuggestField and ContextSuggestField)
      to enable suggestion scoring and more powerful automaton queries.

      Previously, only prefix ‘queries’ with index-time weights were supported but we can also support:

      • Prefix queries expressed as regular expressions: get suggestions that match multiple prefixes
        • Example: star[wa|tr] matches starwars and startrek
      • Fuzzy Prefix queries supporting scoring: get typo tolerant suggestions scored by how close they are to the query prefix
        • Example: querying for seper will score separate higher then superstitious
      • Context Queries: get suggestions boosted and/or filtered based on their indexed contexts (meta data)
        • Boost example: get typo tolerant suggestions on song names with prefix like a roling boosting songs with
          genre rock and indie
        • Filter example: get suggestion on all file names starting with finan only for user1 and user2

      Suggest API

      SuggestIndexSearcher searcher = new SuggestIndexSearcher(reader);
      CompletionQuery query = ...
      TopSuggestDocs suggest = searcher.suggest(query, num);
      

      CompletionQuery

      CompletionQuery is used to query SuggestField and ContextSuggestField. A CompletionQuery produces a CompletionWeight,
      which allows CompletionQuery implementations to pass in an automaton that will be intersected with a FST and allows boosting and
      meta data extraction from the intersected partial paths. A CompletionWeight produces a CompletionScorer. A CompletionScorer
      executes a Top N search against the FST with the provided automaton, scoring and filtering all matched paths.

      PrefixCompletionQuery

      Return documents with values that match the prefix of an analyzed term text
      Documents are sorted according to their suggest field weight.

      PrefixCompletionQuery(Analyzer analyzer, Term term)
      

      RegexCompletionQuery

      Return documents with values that match the prefix of a regular expression
      Documents are sorted according to their suggest field weight.

      RegexCompletionQuery(Term term)
      

      FuzzyCompletionQuery

      Return documents with values that has prefixes within a specified edit distance of an analyzed term text.
      Documents are ‘boosted’ by the number of matching prefix letters of the suggestion with respect to the original term text.

      FuzzyCompletionQuery(Analyzer analyzer, Term term)
      
      Scoring

      suggestion_weight * boost
      where suggestion_weight and boost are all integers.
      boost = # of prefix characters matched

      ContextQuery

      Return documents that match a CompletionQuery filtered and/or boosted by provided context(s).

      ContextQuery(CompletionQuery query)
      contextQuery.addContext(CharSequence context, int boost, boolean exact)
      

      NOTE: ContextQuery should be used with ContextSuggestField to query suggestions boosted and/or filtered by contexts.
      Running ContextQuery against a SuggestField will error out.

      Scoring

      suggestion_weight * context_boost
      where suggestion_weight and context_boost are all integers

      When used with FuzzyCompletionQuery,
      suggestion_weight * (context_boost + fuzzy_boost)

      Context Suggest Field

      To use ContextQuery, use ContextSuggestField instead of SuggestField. Any CompletionQuery can be used with
      ContextSuggestField, the default behaviour is to return suggestions from all contexts. Context for every completion hit
      can be accessed through SuggestScoreDoc#context.

      ContextSuggestField(String name, Collection<CharSequence> contexts, String value, int weight) 
      

      Attachments

        1. LUCENE-6459.patch
          227 kB
          Areek Zillur
        2. LUCENE-6459.patch
          227 kB
          Areek Zillur
        3. LUCENE-6459.patch
          227 kB
          Areek Zillur
        4. LUCENE-6459.patch
          227 kB
          Areek Zillur
        5. LUCENE-6459.patch
          241 kB
          Areek Zillur
        6. LUCENE-6459.patch
          241 kB
          Areek Zillur
        7. LUCENE-6459.patch
          172 kB
          Areek Zillur
        8. LUCENE-6459.patch
          171 kB
          Areek Zillur
        9. LUCENE-6459.patch
          170 kB
          Areek Zillur
        10. LUCENE-6459.patch
          168 kB
          Areek Zillur
        11. LUCENE-6459.patch
          167 kB
          Areek Zillur
        12. LUCENE-6459.patch
          166 kB
          Areek Zillur
        13. LUCENE-6459.patch
          161 kB
          Areek Zillur
        14. LUCENE-6459.patch
          154 kB
          Areek Zillur

        Activity

          People

            areek Areek Zillur
            areek Areek Zillur
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: