Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
"fq" filter queries that have cache=false and which aren't processed as a PostFilter (thus either aren't a PostFilter or have a cost < 100) are processed in SolrIndexSearcher using a custom Filter thingy which uses a cost-ordered series of DocIdSetIterators. This is not TwoPhaseIterator aware, and thus the match() method may be called on docs that ideally would have been filtered by lower-cost filter queries.
Attachments
Issue Links
- is required by
-
SOLR-14164 Replace Solr's FunctionRangeQuery with Lucene's
- Patch Available
- relates to
-
SOLR-13890 Add postfilter support to {!terms} queries
- Closed
- requires
-
LUCENE-9114 Add FunctionValues.cost
- Closed
- links to
The PR has the code details but I want to mention some more bigger picture here.
I have this as a sub-task of Remove/refactor Filter because this reduces the use of the old Filter abstraction. SolrIndexSearcher.ProcessedFilter.filter is now declared as a Query. SolrIndexSearcher no longer has FilterImpl. Now that pf.filter is a Query, this allowed for SolrIndexSearcher.getDocSet(List<Query> fqs) to be simpler and allowed me to remove the similar getDocSetScore.
So how is TwoPhaseIterator used efficiently you may ask? BooleanQuery's FILTER clauses use this internally via ConjunctionDISI. I modified SolrIndexSearcher.getProcessedFilter to create a BooleanQuery with these FILTER clauses for the non-cached queries.
Unfortunately we lose the ability for the "cost" param on these non-cached filter queries to have meaning. Instead, the Queries themselves and any TPIs they may have ought to have suitable costs, and they are not externally configurable. Maybe we could make a wrapping query that wraps the underlying TPI.matchCost... or just not bother, letting the queries themselves actually compute an internal cost that is perhaps better than whatever the user supplies. I lean this way; less complexity. Unfortunately, ValueSourceScorer's TPI matchCost is a constant 100 instead of varying based on the particular FunctionValues implementation. That should be its own issue to address.