So... why not do this in CachingWrapper/SpanFilter, but, instead of discarding the cache entry when deletions must be enforced, we dynamically apply the deletions? (I think we could use FilteredDocIdSet).
Yea, that would work well. You will need to somehow still know when to enable or disable this based on the filter you use (it should basically only be enabled ones that are passed to constant score...
OK I'll take that approach on next iter.
But: I think this may need to be enabled in other cases where the
filter is used (ie not only CSQ). Sure, CSQ is the one example we
have today, where if you pass a Filter that ignores "recent" deletions
you'll be in trouble... but who knows what other uses of a Filter
might trip up on this intentional cache-incoherence we're introducing.
Agreed. As I see it, caching based on IndexReader is key in Lucene, and with NRT, it should feel the same way as it is without it. NRT should not change the way you build your system.
Well... NRT and up-to-date deletions will always present a challenge.
Really, this tradeoff we are making here, where a cached filter can be
set to either 1) ignore new deletions, 2) discard its cache entry and
fully regenerate itself, or 3) dynamically intersect the deletions, is
similar to the discussions we've had about just how an NRT segment
reader should enforce recent deletions.
Ie, ignoring option 1 (which of course gives the best perf), option 2,
while making a reopen more costly, gets you the best search
performance (since only one bit set is checked during searches).
Option 3 makes reopens much faster, but then search peformance takes a
hit (since you're checking 2 bit sets).
Option 2 is analogous to how Lucene now handles the per-segment
deleted docs bit vector (it's fully recreated on each reopen), while
option 3 is analogous to how Zoie handles deletions (new deletions are
dynamically applied to all search hits).