Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7770

BloomFilteringPostingsFormat should implement seekExact(TermState) to avoid to seek within matching segment/field multiple times

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 5.6
    • None
    • core/codecs
    • None
    • New

    Description

      BloomFilteringPostingsFormat$BloomFilteredFieldsProducer$BloomFilteredTermsEnum does not reuse the TermState from the initial lookup when a second seek is issued (usually when we have a match and we build the Scorer).

      Default implementation of TermsEnum#seekExact(BytesRef term, TermState state) that is used by BF calls the regular seekExact method.

      This means that the BloomFilteringPostingsFormat default implementation performs a second lookup for every segment/field that have a match for the term (mostly in the various Weight#scorer implementations).

      I don't think it is a big issue as we mostly expect search terms to have a match in a limited number of segment/field. But for few queries it could become inefficient.

      Attachments

        Activity

          People

            Unassigned Unassigned
            yhector@salesforce.com Yannis Hector
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: