Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6339

[suggest] Near real time Document Suggester

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 5.0
    • 5.1, 6.0
    • core/search
    • None
    • New

    Description

      The idea is to index documents with one or more SuggestField(s) and be able to suggest documents with a SuggestField value that matches a given key.
      A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time.

      Document suggestion can be done on an indexed SuggestField. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time.

      A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions.

      Usage

        // hook up custom postings format
        // indexAnalyzer for SuggestField
        Analyzer analyzer = ...
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        Codec codec = new Lucene50Codec() {
          PostingsFormat completionPostingsFormat = new Completion50PostingsFormat();
      
          @Override
          public PostingsFormat getPostingsFormatForField(String field) {
            if (isSuggestField(field)) {
              return completionPostingsFormat;
            }
            return super.getPostingsFormatForField(field);
          }
        };
        config.setCodec(codec);
        IndexWriter writer = new IndexWriter(dir, config);
        // index some documents with suggestions
        Document doc = new Document();
        doc.add(new SuggestField("suggest_title", "title1", 2));
        doc.add(new SuggestField("suggest_name", "name1", 3));
        writer.addDocument(doc)
        ...
        // open an nrt reader for the directory
        DirectoryReader reader = DirectoryReader.open(writer, false);
        // SuggestIndexSearcher is a thin wrapper over IndexSearcher
        // queryAnalyzer will be used to analyze the query string
        SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer);
        
        // suggest 10 documents for "titl" on "suggest_title" field
        TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
      

      Indexing

      Index analyzer set through IndexWriterConfig

      SuggestField(String name, String value, long weight) 
      

      Query

      Query analyzer set through SuggestIndexSearcher.
      Hits are collected in descending order of the suggestion's weight

      // full options for TopSuggestDocs (TopDocs)
      TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter)
      
      // full options for Collector
      // note: only collects does not score
      void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) 
      

      Analyzer

      CompletionAnalyzer can be used instead to wrap another analyzer to tune suggest field only parameters.

      CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions)
      

      Attachments

        1. LUCENE-6339.patch
          119 kB
          Areek Zillur
        2. LUCENE-6339.patch
          117 kB
          Areek Zillur
        3. LUCENE-6339.patch
          116 kB
          Areek Zillur
        4. LUCENE-6339.patch
          116 kB
          Areek Zillur
        5. LUCENE-6339.patch
          110 kB
          Areek Zillur
        6. LUCENE-6339.patch
          108 kB
          Areek Zillur
        7. LUCENE-6339.patch
          109 kB
          Areek Zillur

        Activity

          People

            areek Areek Zillur
            areek Areek Zillur
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: