Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10559

Add preFilter/postFilter options to KnnGraphTester

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 9.4
    • None
    • None
    • New

    Description

      We want to be able to test the efficacy of pre-filtering in KnnVectorQuery: if you (say) want the top K nearest neighbors subject to a constraint Q, are you better off over-selecting (say 2K) top hits and then filtering (post-filtering), or incorporating the filtering into the query (pre-filtering). How does it depend on the selectivity of the filter?

      I think we can get a reasonable testbed by generating a uniform random filter with some selectivity (that is consistent and repeatable). Possibly we'd also want to try filters that are correlated with index order, but it seems they'd be unlikely to be correlated with vector values in a way that the graph structure would notice, so random is a pretty good starting point for this.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sokolov Michael Sokolov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 7h 20m
                  7h 20m