[LUCENE-10559] Add preFilter/postFilter options to KnnGraphTester - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 9.4
Component/s: None
Labels:
None

Lucene Fields:

New

Description

We want to be able to test the efficacy of pre-filtering in KnnVectorQuery: if you (say) want the top K nearest neighbors subject to a constraint Q, are you better off over-selecting (say 2K) top hits and then filtering (post-filtering), or incorporating the filtering into the query (pre-filtering). How does it depend on the selectivity of the filter?

I think we can get a reasonable testbed by generating a uniform random filter with some selectivity (that is consistent and repeatable). Possibly we'd also want to try filters that are correlated with index order, but it seems they'd be unlikely to be correlated with vector values in a way that the graph structure would notice, so random is a pretty good starting point for this.

Attachments

Issue Links

links to

GitHub Pull Request #932

Activity

People

Assignee:: Unassigned

Reporter:: Michael Sokolov

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/May/22 21:23

Updated:: 28/Aug/22 16:40

Resolved:: 29/Jul/22 18:22

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

7h 20m