Lucene - Core
  1. Lucene - Core
  2. LUCENE-6339

[suggest] Near real time Document Suggester

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.0
    • Fix Version/s: 5.1, 6.0
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The idea is to index documents with one or more SuggestField(s) and be able to suggest documents with a SuggestField value that matches a given key.
      A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time.

      Document suggestion can be done on an indexed SuggestField. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time.

      A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions.

      Usage

        // hook up custom postings format
        // indexAnalyzer for SuggestField
        Analyzer analyzer = ...
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        Codec codec = new Lucene50Codec() {
          PostingsFormat completionPostingsFormat = new Completion50PostingsFormat();
      
          @Override
          public PostingsFormat getPostingsFormatForField(String field) {
            if (isSuggestField(field)) {
              return completionPostingsFormat;
            }
            return super.getPostingsFormatForField(field);
          }
        };
        config.setCodec(codec);
        IndexWriter writer = new IndexWriter(dir, config);
        // index some documents with suggestions
        Document doc = new Document();
        doc.add(new SuggestField("suggest_title", "title1", 2));
        doc.add(new SuggestField("suggest_name", "name1", 3));
        writer.addDocument(doc)
        ...
        // open an nrt reader for the directory
        DirectoryReader reader = DirectoryReader.open(writer, false);
        // SuggestIndexSearcher is a thin wrapper over IndexSearcher
        // queryAnalyzer will be used to analyze the query string
        SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer);
        
        // suggest 10 documents for "titl" on "suggest_title" field
        TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
      

      Indexing

      Index analyzer set through IndexWriterConfig

      SuggestField(String name, String value, long weight) 
      

      Query

      Query analyzer set through SuggestIndexSearcher.
      Hits are collected in descending order of the suggestion's weight

      // full options for TopSuggestDocs (TopDocs)
      TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter)
      
      // full options for Collector
      // note: only collects does not score
      void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) 
      

      Analyzer

      CompletionAnalyzer can be used instead to wrap another analyzer to tune suggest field only parameters.

      CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions)
      
      1. LUCENE-6339.patch
        119 kB
        Areek Zillur
      2. LUCENE-6339.patch
        117 kB
        Areek Zillur
      3. LUCENE-6339.patch
        116 kB
        Areek Zillur
      4. LUCENE-6339.patch
        116 kB
        Areek Zillur
      5. LUCENE-6339.patch
        110 kB
        Areek Zillur
      6. LUCENE-6339.patch
        108 kB
        Areek Zillur
      7. LUCENE-6339.patch
        109 kB
        Areek Zillur

        Activity

        Hide
        Areek Zillur added a comment - - edited

        Initial patch. needs more unit tests.
        The custom postings format was originally a fork of Completion090PostingsFormat . Document suggestion uses the same TopNSearcher as that of AnalyzingSuggester.

        Would be awesome to get some feedback on the patch!

        Show
        Areek Zillur added a comment - - edited Initial patch. needs more unit tests. The custom postings format was originally a fork of Completion090PostingsFormat . Document suggestion uses the same TopNSearcher as that of AnalyzingSuggester. Would be awesome to get some feedback on the patch!
        Hide
        Michael McCandless added a comment -

        This looks really nice!

        I think AutomatonUtil is (nearly?) the same thing as
        TokenStreamToAutomaton? Can we somehow consolidate the two?

        When I try to "ant test" with the patch on current 5.x some things are
        angry:

            [mkdir] Created dir: /l/areek/lucene/build/suggest/classes/java
            [javac] Compiling 65 source files to /l/areek/lucene/build/suggest/classes/java
            [javac] /l/areek/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.java:597: warning: [cast] redundant cast to TopFieldDocs
            [javac]       TopFieldDocs hits = (TopFieldDocs) c.topDocs();
            [javac]                           ^
            [javac] /l/areek/lucene/suggest/src/java/org/apache/lucene/search/suggest/document/NRTSuggester.java:208: error: local variable collector is accessed from within inner class; needs to be declared final
            [javac]               collector.collect(docID);
            [javac]               ^
            [javac] /l/areek/lucene/suggest/src/java/org/apache/lucene/search/suggest/document/CompletionFieldsProducer.java:164: error: CompletionFieldsProducer.CompletionsTermsReader is not abstract and does not override abstract method getChildResources() in Accountable
            [javac]   private class CompletionsTermsReader implements Accountable {
            [javac]           ^
            [javac] Note: Some input files use or override a deprecated API.
            [javac] Note: Recompile with -Xlint:deprecation for details.
            [javac] 2 errors
            [javac] 1 warning
        

        Not sure why we need an FSTBuilder inside the NRTSuggesterBuilder;
        can't the first be absorbed into the latter? Can NRTSuggesterBuilder
        be package private? Ie the public API here is the postings format and
        SuggestIndexSearcher / SuggestTopDocs? I think other things can be
        private, e.g. CompletionTokenStream.

        Can you use CodecUtil.writeIndexHeader when storing the FST? It also
        stores the segment ID and file extension in the header. And then
        CodecUtil.checkIndexHeader at read-time.

        CompletionTermsReader.lookup() should be sync'd? Else two threads
        could try to use the IndexInput (dictIn) at once?

        Maybe we should move the code in SuggestIndexSearcher.suggest into
        a new TopSuggestDocs.merge method?

        Do we really need the separate SegmentLookup interface? Seems like we
        can just invoke lookup method directly on CompletionTerms?

        Why do we allow -1 weight? And why do we restrict to int not long
        (other suggesters are long I think, though it does seem like
        overkill!).

        Show
        Michael McCandless added a comment - This looks really nice! I think AutomatonUtil is (nearly?) the same thing as TokenStreamToAutomaton? Can we somehow consolidate the two? When I try to "ant test" with the patch on current 5.x some things are angry: [mkdir] Created dir: /l/areek/lucene/build/suggest/classes/java [javac] Compiling 65 source files to /l/areek/lucene/build/suggest/classes/java [javac] /l/areek/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.java:597: warning: [cast] redundant cast to TopFieldDocs [javac] TopFieldDocs hits = (TopFieldDocs) c.topDocs(); [javac] ^ [javac] /l/areek/lucene/suggest/src/java/org/apache/lucene/search/suggest/document/NRTSuggester.java:208: error: local variable collector is accessed from within inner class; needs to be declared final [javac] collector.collect(docID); [javac] ^ [javac] /l/areek/lucene/suggest/src/java/org/apache/lucene/search/suggest/document/CompletionFieldsProducer.java:164: error: CompletionFieldsProducer.CompletionsTermsReader is not abstract and does not override abstract method getChildResources() in Accountable [javac] private class CompletionsTermsReader implements Accountable { [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 2 errors [javac] 1 warning Not sure why we need an FSTBuilder inside the NRTSuggesterBuilder; can't the first be absorbed into the latter? Can NRTSuggesterBuilder be package private? Ie the public API here is the postings format and SuggestIndexSearcher / SuggestTopDocs? I think other things can be private, e.g. CompletionTokenStream. Can you use CodecUtil.writeIndexHeader when storing the FST? It also stores the segment ID and file extension in the header. And then CodecUtil.checkIndexHeader at read-time. CompletionTermsReader.lookup() should be sync'd? Else two threads could try to use the IndexInput (dictIn) at once? Maybe we should move the code in SuggestIndexSearcher.suggest into a new TopSuggestDocs.merge method? Do we really need the separate SegmentLookup interface? Seems like we can just invoke lookup method directly on CompletionTerms? Why do we allow -1 weight? And why do we restrict to int not long (other suggesters are long I think, though it does seem like overkill!).
        Hide
        Simon Willnauer added a comment -

        Hey Areek, I agree with mike this looks awesome... lemme give you some comments

        • can we make CompletionAnalyzer immutable by any chance? I'd really like to not have setters if possible? For that I guess it's constants need to be public as well?
        • is private boolean isReservedInputCharacter(char c) }} needed since we then afterwards check it again in the {{checkKey method, maybe you just wanna use a switch here?
        • In CompletionFieldsConsumer#close() I think we need to make sure IOUtils.close(dictOut); is also called if an exception is hit?
        • do we need the extra InputStreamDataInput in CompletionTermWriter#parse, I mean it's a byte input stream so we should be able to read all of the bytes?
        • SuggestPayload doesn't need a default ctor
        • can we use {{ if (success == false) }} instead of {{ if (!success) }} as a pattern in general?
        • use try / finally in CompletionFieldsProducer#close() to ensure all resource are closed or pass both the dict and {{ delegateFieldsProducer }} to IOUtils#close()?
        • you fetch the checksum for the dict file in {{ CompletionFieldsProducer#ctor }} via {{ CodecUtil.retrieveChecksum(dictIn); } but you ignore it's return value, was this intended? I think you don't wanna do that here? Did you intend to check the entire file?
        • I wonder if we should just write one file for both, the index and the FSTs? What's the benefit from having two?

        For loading the dict you put a comment in there sayingm {{ // is there a better way of doing this?}}

        I think what you need to do is this:

        public synchronized SegmentLookup lookup() throws IOException {
          if (lookup == null) {
             try (IndexInput dictClone = dictIn.clone()) { // let multiple fields load concurrently
                 dictClone.seek(offset); // this is your field private clone 
                 lookup = NRTSuggester.load(dictClone);
             }
          }
          return lookup;
        }
        

        I'd appreciate a tests that this works just fine ie. loading multiple FSTs concurrently.

        I didn't get further than this due to the lack of time but I will come back to this either today or tomorrow. Good stuff Areek

        Show
        Simon Willnauer added a comment - Hey Areek, I agree with mike this looks awesome... lemme give you some comments can we make CompletionAnalyzer immutable by any chance? I'd really like to not have setters if possible? For that I guess it's constants need to be public as well? is private boolean isReservedInputCharacter(char c) }} needed since we then afterwards check it again in the {{checkKey method, maybe you just wanna use a switch here? In CompletionFieldsConsumer#close() I think we need to make sure IOUtils.close(dictOut); is also called if an exception is hit? do we need the extra InputStreamDataInput in CompletionTermWriter#parse , I mean it's a byte input stream so we should be able to read all of the bytes? SuggestPayload doesn't need a default ctor can we use {{ if (success == false) }} instead of {{ if (!success) }} as a pattern in general? use try / finally in CompletionFieldsProducer#close() to ensure all resource are closed or pass both the dict and {{ delegateFieldsProducer }} to IOUtils#close()? you fetch the checksum for the dict file in {{ CompletionFieldsProducer#ctor }} via {{ CodecUtil.retrieveChecksum(dictIn); } but you ignore it's return value, was this intended? I think you don't wanna do that here? Did you intend to check the entire file? I wonder if we should just write one file for both, the index and the FSTs? What's the benefit from having two? For loading the dict you put a comment in there sayingm {{ // is there a better way of doing this?}} I think what you need to do is this: public synchronized SegmentLookup lookup() throws IOException { if (lookup == null ) { try (IndexInput dictClone = dictIn.clone()) { // let multiple fields load concurrently dictClone.seek(offset); // this is your field private clone lookup = NRTSuggester.load(dictClone); } } return lookup; } I'd appreciate a tests that this works just fine ie. loading multiple FSTs concurrently. I didn't get further than this due to the lack of time but I will come back to this either today or tomorrow. Good stuff Areek
        Hide
        Areek Zillur added a comment -

        Thanks Michael McCandless and Simon Willnauer for the feedback!

        When I try to "ant test" with the patch on current 5.x some things are
        angry

        This is fixed.
        Hmm interestingly enough those errors do not show up for me, using java8.

        Updated Patch:

        • removed private boolean isReservedInputCharacter(char c) and moved reserved input char check to toAutomaton(CharSequence key)
        • use CodecUtil.checkIndexHeader && CodecUtil.writeIndexHeader for all files in custom postings format
        • use if (success== false) instead of if(!success)
        • proper sync for loading FSTs concurrently
        • added TopSuggestDocs.merge method
        • make sure CompletionFieldsConsumer#close() and CompletionFieldsProducer#close() properly handle closing resources
        • removed SegmentLookup interface; use NRTSuggester directly
        • fixed weight check to not allow negative weights; allow long values
        • removed FSTBuilder and made NRTSuggesterBuilder & CompletionTokenStream package-private

        Still TODO:

        • consolidate AutomatonUtil and TokenStreamToAutomaton
        • make CompletionAnalyzer immutable
        • remove use of extra InputStreamDataInput in CompletionTermWriter#parse
        • test loading multiple FSTs concurrently
        • more unit tests
        Show
        Areek Zillur added a comment - Thanks Michael McCandless and Simon Willnauer for the feedback! When I try to "ant test" with the patch on current 5.x some things are angry This is fixed. Hmm interestingly enough those errors do not show up for me, using java8. Updated Patch: removed private boolean isReservedInputCharacter(char c) and moved reserved input char check to toAutomaton(CharSequence key) use CodecUtil.checkIndexHeader && CodecUtil.writeIndexHeader for all files in custom postings format use if (success== false) instead of if(!success) proper sync for loading FSTs concurrently added TopSuggestDocs.merge method make sure CompletionFieldsConsumer#close() and CompletionFieldsProducer#close() properly handle closing resources removed SegmentLookup interface; use NRTSuggester directly fixed weight check to not allow negative weights; allow long values removed FSTBuilder and made NRTSuggesterBuilder & CompletionTokenStream package-private Still TODO: consolidate AutomatonUtil and TokenStreamToAutomaton make CompletionAnalyzer immutable remove use of extra InputStreamDataInput in CompletionTermWriter#parse test loading multiple FSTs concurrently more unit tests
        Hide
        Areek Zillur added a comment -

        you fetch the checksum for the dict file in {{ CompletionFieldsProducer#ctor }} via {{ CodecUtil.retrieveChecksum(dictIn); } but you ignore it's return value, was this intended? I think you don't wanna do that here? Did you intend to check the entire file?
        I wonder if we should just write one file for both, the index and the FSTs? What's the benefit from having two?

        This was intentional, used the same convention for BlockTreeTermsReader#termsIn here. The thought was doing the checksum check would be very costly, in most cases the dict file would be large?
        If we write one file instead of two, then the checksum check would be more expensive for the index then now?

        Show
        Areek Zillur added a comment - you fetch the checksum for the dict file in {{ CompletionFieldsProducer#ctor }} via {{ CodecUtil.retrieveChecksum(dictIn); } but you ignore it's return value, was this intended? I think you don't wanna do that here? Did you intend to check the entire file? I wonder if we should just write one file for both, the index and the FSTs? What's the benefit from having two? This was intentional, used the same convention for BlockTreeTermsReader#termsIn here. The thought was doing the checksum check would be very costly, in most cases the dict file would be large? If we write one file instead of two, then the checksum check would be more expensive for the index then now?
        Hide
        Areek Zillur added a comment -

        Updated Patch:

        • nuke AutomatonUtil
        • make CompletionAnalyzer immutable
        • add tests
        • minor fixes
        Show
        Areek Zillur added a comment - Updated Patch: nuke AutomatonUtil make CompletionAnalyzer immutable add tests minor fixes
        Hide
        Michael McCandless added a comment -

        New patch looks great, thanks Areek Zillur!

        In TopSuggestDocsCollector:

        • In collect, we seem to assume the suggest searcher will never call
          collect more than num times? How is that? If so, can you add that to
          the javadocs, and maybe add an assert upto < num in collect?
        • Can we just allocate scoreDocs up front instead of lazily?
        • In the javadocs, instead of "one hit can be..." maybe "one doc can
          be..."? Hit is a tricky word in this context since it could be a doc
          or a suggestion...

        In SuggestIndexSearcher, does it really ever make sense to take a
        generic Collector/LeafCollector? Can we instead just strongly type
        the params to all the methods to be TopSuggestDocsCollector?

        "In case a filter has to be applied, the queue size is doubled" is not
        quite correct? Maybe change the logic there so the int queueSize is
        first computed, and then if filter is enabled, it's doubled?

        Can we remove the separate WeightProcessor class and just make
        encode/decode static methods on NRTSuggester? We can add back
        abstractions later if users somehow need control over weight
        encoding...

        Can we add a test that tests the extreme case of nearly all docs
        filtered out and another test with nearly all docs deleted?

        Show
        Michael McCandless added a comment - New patch looks great, thanks Areek Zillur ! In TopSuggestDocsCollector: In collect, we seem to assume the suggest searcher will never call collect more than num times? How is that? If so, can you add that to the javadocs, and maybe add an assert upto < num in collect? Can we just allocate scoreDocs up front instead of lazily? In the javadocs, instead of "one hit can be..." maybe "one doc can be..."? Hit is a tricky word in this context since it could be a doc or a suggestion... In SuggestIndexSearcher, does it really ever make sense to take a generic Collector/LeafCollector? Can we instead just strongly type the params to all the methods to be TopSuggestDocsCollector? "In case a filter has to be applied, the queue size is doubled" is not quite correct? Maybe change the logic there so the int queueSize is first computed, and then if filter is enabled, it's doubled? Can we remove the separate WeightProcessor class and just make encode/decode static methods on NRTSuggester? We can add back abstractions later if users somehow need control over weight encoding... Can we add a test that tests the extreme case of nearly all docs filtered out and another test with nearly all docs deleted?
        Hide
        Areek Zillur added a comment -

        Thanks Michael McCandless for the review!

        In TopSuggestDocsCollector:
        In collect, we seem to assume the suggest searcher will never call
        collect more than num times? How is that? If so, can you add that to
        the javadocs, and maybe add an assert upto < num in collect?
        Can we just allocate scoreDocs up front instead of lazily?
        In the javadocs, instead of "one hit can be..." maybe "one doc can
        be..."? Hit is a tricky word in this context since it could be a doc
        or a suggestion...

        I have re written TopSuggestDocsCollector to have a priority queue at the top-level instead, somewhat similar to TopDocsCollector.
        Now completions across segments are collected in the same pq, this allows early termination for suggesters at the segment level
        (when a collected completion overflows the pq, we can disregard the rest of the completions for that segment,
        as completions are collected in order of their scores).

        In SuggestIndexSearcher, does it really ever make sense to take a
        generic Collector/LeafCollector? Can we instead just strongly type
        the params to all the methods to be TopSuggestDocsCollector?

        Thanks for the suggestion! the generic Collector/LeafCollector is removed.
        Current API:

        public void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) 
        

        "In case a filter has to be applied, the queue size is doubled" is not
        quite correct? Maybe change the logic there so the int queueSize is
        first computed, and then if filter is enabled, it's doubled?

        Now the queueSize is increased by half the # of live docs in the segment instead. If a filter is applied, the queue size should
        be increased w.r.t. to # of documents.
        if the applied filter filters out <= half of the top scoring documents for a query prefix, then the search is admissible.
        if a filter is too restrictive, then the search is inadmissible. a work around would be to multiply num by some factor,
        in this case early termination might help (if TopSuggestDocsCollector is initialized with the original num). thoughts?

        Updated Patch:

        • SuggestIndexSearcher cleanup
        • TopSuggestDocsCollector re-write
        • remove WeightProcessor from NRTSuggester
        • added more tests (including boundary cases for deleted/filtered out documents)
        Show
        Areek Zillur added a comment - Thanks Michael McCandless for the review! In TopSuggestDocsCollector: In collect, we seem to assume the suggest searcher will never call collect more than num times? How is that? If so, can you add that to the javadocs, and maybe add an assert upto < num in collect? Can we just allocate scoreDocs up front instead of lazily? In the javadocs, instead of "one hit can be..." maybe "one doc can be..."? Hit is a tricky word in this context since it could be a doc or a suggestion... I have re written TopSuggestDocsCollector to have a priority queue at the top-level instead, somewhat similar to TopDocsCollector . Now completions across segments are collected in the same pq, this allows early termination for suggesters at the segment level (when a collected completion overflows the pq, we can disregard the rest of the completions for that segment, as completions are collected in order of their scores). In SuggestIndexSearcher, does it really ever make sense to take a generic Collector/LeafCollector? Can we instead just strongly type the params to all the methods to be TopSuggestDocsCollector? Thanks for the suggestion! the generic Collector/LeafCollector is removed. Current API: public void suggest( String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) "In case a filter has to be applied, the queue size is doubled" is not quite correct? Maybe change the logic there so the int queueSize is first computed, and then if filter is enabled, it's doubled? Now the queueSize is increased by half the # of live docs in the segment instead. If a filter is applied, the queue size should be increased w.r.t. to # of documents. if the applied filter filters out <= half of the top scoring documents for a query prefix, then the search is admissible. if a filter is too restrictive, then the search is inadmissible. a work around would be to multiply num by some factor, in this case early termination might help (if TopSuggestDocsCollector is initialized with the original num ). thoughts? Updated Patch: SuggestIndexSearcher cleanup TopSuggestDocsCollector re-write remove WeightProcessor from NRTSuggester added more tests (including boundary cases for deleted/filtered out documents)
        Hide
        Areek Zillur added a comment -

        Updated Patch:

        • minor fixes
        Show
        Areek Zillur added a comment - Updated Patch: minor fixes
        Hide
        Michael McCandless added a comment -

        Patch looks great!

        Can we pull out SuggestScoreDocPQ into its own .java source? Should its lessThan method tie break by docID?

        I think the logic to compute maxQueueSize in getMaxTopNSearcherQueueSize could possibly overflow int? Maybe use long, and then cast back to int after the Math.min?

        Show
        Michael McCandless added a comment - Patch looks great! Can we pull out SuggestScoreDocPQ into its own .java source? Should its lessThan method tie break by docID? I think the logic to compute maxQueueSize in getMaxTopNSearcherQueueSize could possibly overflow int? Maybe use long, and then cast back to int after the Math.min?
        Hide
        Areek Zillur added a comment -

        Thanks Michael McCandless for the suggestions!

        Updated Patch:

        • separate out SuggestScoreDocPriorityQueue (break ties with docID)
        • use long to calculate maxQueueSize
        • minor changes
        Show
        Areek Zillur added a comment - Thanks Michael McCandless for the suggestions! Updated Patch: separate out SuggestScoreDocPriorityQueue (break ties with docID) use long to calculate maxQueueSize minor changes
        Hide
        Michael McCandless added a comment -

        I think the tie break should be a.doc > b.doc, for consistency with Lucene?

        I.e., on a score tie, the smaller doc ID should sorter earlier than the bigger doc ID?

        Otherwise +1 to commit! Thanks Areek Zillur!

        Show
        Michael McCandless added a comment - I think the tie break should be a.doc > b.doc, for consistency with Lucene? I.e., on a score tie, the smaller doc ID should sorter earlier than the bigger doc ID? Otherwise +1 to commit! Thanks Areek Zillur !
        Hide
        Uwe Schindler added a comment -

        I just reviewed the patch, too. I like the API, but have not yet looked into it closely like Mike - I just skimmed it.

        Just one question: What happens if 2 documents have the same SuggestField and same suggestion presented to user? This would now produce duplicates, right? I was just thinking about how to prevent that (coming from Elasticsearch world).

        Show
        Uwe Schindler added a comment - I just reviewed the patch, too. I like the API, but have not yet looked into it closely like Mike - I just skimmed it. Just one question: What happens if 2 documents have the same SuggestField and same suggestion presented to user? This would now produce duplicates, right? I was just thinking about how to prevent that (coming from Elasticsearch world).
        Hide
        Areek Zillur added a comment -

        Updated Patch:

        • SuggestScoreSocPQ prefers smaller doc id
        • documentation fixes

        I will commit this shortly, Thanks for all the feedback Michael McCandless & Simon Willnauer

        Show
        Areek Zillur added a comment - Updated Patch: SuggestScoreSocPQ prefers smaller doc id documentation fixes I will commit this shortly, Thanks for all the feedback Michael McCandless & Simon Willnauer
        Hide
        Uwe Schindler added a comment -

        +1

        Show
        Uwe Schindler added a comment - +1
        Hide
        Areek Zillur added a comment -

        Hi Uwe Schindler,
        Thanks for the review!
        If two documents do have the same suggestion for the same SuggestField, it will produce duplicates in terms of the suggestion, but because they are from two documents (different doc ids) they are not considered as duplicates.
        Maybe we can add a boolean flag in the NRTSuggester to only collect unique suggestions, but then we will have to decide on which suggestion to throw out, as they are now tied to doc ids?

        Show
        Areek Zillur added a comment - Hi Uwe Schindler , Thanks for the review! If two documents do have the same suggestion for the same SuggestField, it will produce duplicates in terms of the suggestion, but because they are from two documents (different doc ids) they are not considered as duplicates. Maybe we can add a boolean flag in the NRTSuggester to only collect unique suggestions, but then we will have to decide on which suggestion to throw out, as they are now tied to doc ids?
        Hide
        Uwe Schindler added a comment - - edited

        If two documents do have the same suggestion for the same SuggestField, it will produce duplicates in terms of the suggestion, but because they are from two documents (different doc ids) they are not considered as duplicates.

        Yeah that's what I mean by duplicate. The suggester only returns doc ids. For display to user, it would read a stored field like you do when presenting search results (the actual suggestion) and this produces the duplicate. I am not sure how to solve that. It was just an idea. If this is really an issue, one could filter the duplicates afterwards.

        Show
        Uwe Schindler added a comment - - edited If two documents do have the same suggestion for the same SuggestField, it will produce duplicates in terms of the suggestion, but because they are from two documents (different doc ids) they are not considered as duplicates. Yeah that's what I mean by duplicate. The suggester only returns doc ids. For display to user, it would read a stored field like you do when presenting search results (the actual suggestion) and this produces the duplicate. I am not sure how to solve that. It was just an idea. If this is really an issue, one could filter the duplicates afterwards.
        Hide
        ASF subversion and git services added a comment -

        Commit 1669698 from Areek Zillur in branch 'dev/trunk'
        [ https://svn.apache.org/r1669698 ]

        LUCENE-6339: Added Near-real time Document Suggester via custom postings format

        Show
        ASF subversion and git services added a comment - Commit 1669698 from Areek Zillur in branch 'dev/trunk' [ https://svn.apache.org/r1669698 ] LUCENE-6339 : Added Near-real time Document Suggester via custom postings format
        Hide
        Uwe Schindler added a comment -

        Indeed the suggestion does not need to come from a stored field of the result document, nice! But one could use that to add additional suggestion information, right - instead of the payload?

        Show
        Uwe Schindler added a comment - Indeed the suggestion does not need to come from a stored field of the result document, nice! But one could use that to add additional suggestion information, right - instead of the payload?
        Hide
        ASF subversion and git services added a comment -

        Commit 1669703 from Areek Zillur in branch 'dev/trunk'
        [ https://svn.apache.org/r1669703 ]

        LUCENE-6339: move changes entry from 6.0.0 to 5.1.0

        Show
        ASF subversion and git services added a comment - Commit 1669703 from Areek Zillur in branch 'dev/trunk' [ https://svn.apache.org/r1669703 ] LUCENE-6339 : move changes entry from 6.0.0 to 5.1.0
        Hide
        Areek Zillur added a comment -

        Yes Uwe Schindler that is the idea . the payload option has been removed entirely, now instead of using payloads one can grab any associated values from the document with each suggestion

        Show
        Areek Zillur added a comment - Yes Uwe Schindler that is the idea . the payload option has been removed entirely, now instead of using payloads one can grab any associated values from the document with each suggestion
        Hide
        Areek Zillur added a comment -

        Committed to branch_5x with revision r1669715 (missed out on prepending the commit message with jira #)

        Show
        Areek Zillur added a comment - Committed to branch_5x with revision r1669715 (missed out on prepending the commit message with jira #)
        Hide
        ASF subversion and git services added a comment -

        Commit 1670969 from Areek Zillur in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1670969 ]

        LUCENE-6339: fix test bug (ensure opening nrt reader with applyAllDeletes)

        Show
        ASF subversion and git services added a comment - Commit 1670969 from Areek Zillur in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1670969 ] LUCENE-6339 : fix test bug (ensure opening nrt reader with applyAllDeletes)
        Hide
        ASF subversion and git services added a comment -

        Commit 1670972 from Areek Zillur in branch 'dev/trunk'
        [ https://svn.apache.org/r1670972 ]

        LUCENE-6339: fix test bug (ensure opening nrt reader with applyAllDeletes)

        Show
        ASF subversion and git services added a comment - Commit 1670972 from Areek Zillur in branch 'dev/trunk' [ https://svn.apache.org/r1670972 ] LUCENE-6339 : fix test bug (ensure opening nrt reader with applyAllDeletes)
        Hide
        ASF subversion and git services added a comment -

        Commit 1670978 from Areek Zillur in branch 'dev/branches/lucene_solr_5_1'
        [ https://svn.apache.org/r1670978 ]

        LUCENE-6339: fix test bug (ensure opening nrt reader with applyAllDeletes)

        Show
        ASF subversion and git services added a comment - Commit 1670978 from Areek Zillur in branch 'dev/branches/lucene_solr_5_1' [ https://svn.apache.org/r1670978 ] LUCENE-6339 : fix test bug (ensure opening nrt reader with applyAllDeletes)
        Hide
        ASF subversion and git services added a comment -

        Commit 1671187 from Areek Zillur in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1671187 ]

        LUCENE-6339: fix test (ensure the maximum requested size is bounded to 1000)

        Show
        ASF subversion and git services added a comment - Commit 1671187 from Areek Zillur in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1671187 ] LUCENE-6339 : fix test (ensure the maximum requested size is bounded to 1000)
        Hide
        ASF subversion and git services added a comment -

        Commit 1671189 from Areek Zillur in branch 'dev/branches/lucene_solr_5_1'
        [ https://svn.apache.org/r1671189 ]

        LUCENE-6339: fix test (ensure the maximum requested size is bounded to 1000)

        Show
        ASF subversion and git services added a comment - Commit 1671189 from Areek Zillur in branch 'dev/branches/lucene_solr_5_1' [ https://svn.apache.org/r1671189 ] LUCENE-6339 : fix test (ensure the maximum requested size is bounded to 1000)
        Hide
        ASF subversion and git services added a comment -

        Commit 1671196 from Areek Zillur in branch 'dev/trunk'
        [ https://svn.apache.org/r1671196 ]

        LUCENE-6339: fix test (ensure the maximum requested size is bounded to 1000)

        Show
        ASF subversion and git services added a comment - Commit 1671196 from Areek Zillur in branch 'dev/trunk' [ https://svn.apache.org/r1671196 ] LUCENE-6339 : fix test (ensure the maximum requested size is bounded to 1000)
        Hide
        ASF subversion and git services added a comment -

        Commit 1671914 from Areek Zillur in branch 'dev/trunk'
        [ https://svn.apache.org/r1671914 ]

        LUCENE-6339: fix test (take into account inadmissible filtered search for multiple segments)

        Show
        ASF subversion and git services added a comment - Commit 1671914 from Areek Zillur in branch 'dev/trunk' [ https://svn.apache.org/r1671914 ] LUCENE-6339 : fix test (take into account inadmissible filtered search for multiple segments)
        Hide
        ASF subversion and git services added a comment -

        Commit 1671915 from Areek Zillur in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1671915 ]

        LUCENE-6339: fix test (take into account inadmissible filtered search for multiple segments)

        Show
        ASF subversion and git services added a comment - Commit 1671915 from Areek Zillur in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1671915 ] LUCENE-6339 : fix test (take into account inadmissible filtered search for multiple segments)
        Hide
        ASF subversion and git services added a comment -

        Commit 1671916 from Areek Zillur in branch 'dev/branches/lucene_solr_5_1'
        [ https://svn.apache.org/r1671916 ]

        LUCENE-6339: fix test (take into account inadmissible filtered search for multiple segments)

        Show
        ASF subversion and git services added a comment - Commit 1671916 from Areek Zillur in branch 'dev/branches/lucene_solr_5_1' [ https://svn.apache.org/r1671916 ] LUCENE-6339 : fix test (take into account inadmissible filtered search for multiple segments)
        Hide
        ASF subversion and git services added a comment -

        Commit 1672458 from Steve Rowe in branch 'dev/trunk'
        [ https://svn.apache.org/r1672458 ]

        LUCENE-6339: Maven config: add resource dir src/resources/ to the POM.

        Show
        ASF subversion and git services added a comment - Commit 1672458 from Steve Rowe in branch 'dev/trunk' [ https://svn.apache.org/r1672458 ] LUCENE-6339 : Maven config: add resource dir src/resources/ to the POM.
        Hide
        ASF subversion and git services added a comment -

        Commit 1672459 from Steve Rowe in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1672459 ]

        LUCENE-6339: Maven config: add resource dir src/resources/ to the POM. (merged trunk r1672458)

        Show
        ASF subversion and git services added a comment - Commit 1672459 from Steve Rowe in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1672459 ] LUCENE-6339 : Maven config: add resource dir src/resources/ to the POM. (merged trunk r1672458)
        Hide
        ASF subversion and git services added a comment -

        Commit 1672461 from Steve Rowe in branch 'dev/branches/lucene_solr_5_1'
        [ https://svn.apache.org/r1672461 ]

        LUCENE-6339: Maven config: add resource dir src/resources/ to the POM. (merged trunk r1672458)

        Show
        ASF subversion and git services added a comment - Commit 1672461 from Steve Rowe in branch 'dev/branches/lucene_solr_5_1' [ https://svn.apache.org/r1672461 ] LUCENE-6339 : Maven config: add resource dir src/resources/ to the POM. (merged trunk r1672458)
        Hide
        Timothy Potter added a comment -

        Bulk close after 5.1 release

        Show
        Timothy Potter added a comment - Bulk close after 5.1 release

          People

          • Assignee:
            Areek Zillur
            Reporter:
            Areek Zillur
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development