This was an excellent idea, and it's great that it uncovered some
dangerous and very unexpected places where we are passing top-level
reader to the FieldCache (eg that explain() could suddenly populate
the FieldCache w/ top-level values is quite shocking!).
ReaderUtil.subSearcher is doing the same thing as
I love the RAMUsageEstimator... we have other places that estimate RAM
(eg IndexWriter does so for added & deleted docs) that we should
eventually cutover to this new API.
I particularly love the new class named Insanity:
public static Insanity checkSanity(FieldCache cache)
MultiDocIdSet/Iterator makes me a bit nervous, because it's further
"propogating" a non-segment-based iterator deeper into Lucene than I
think we want to. It's similar to eg using
DirectoryReader.MultiTermDocs (what Lucene used to do), instead of
stepping through the segments yourself.
Also, shouldn't explain most closely match what was done during
searching (ie, run "per segment")? So simply pushing explain down to
the sub-reader that has the doc seems appropriate? Ie we want it to
share as much of the code path as possible with how searching was in
EG for ConstantScoreQuery.explain, it seems like we should 1) locate
the sub-reader that this doc falls in, and 2) get a scorer against
that reader, then 3) build up the explanation from that? Likewise for
In fact.... maybe we should simply fix IndexSearcher.explain to do
this for all queries? Ie, get the top-level weight, locate sub-reader
that has the doc, un-base the doc, and then invoke QueryWeight.explain
with that sub-reader and un-based doc? Then we don't have to do
anything special for each query. I think QueryWeight.scorer()
shouldn't be expected to handle a "top level reader" being passed in.
Ie, higher up in Lucene we should do that switch, so that we don't
have to do it (this "valuesFromSubReaders" arg) for every scorer.
Hmm: why do we even have explain at both the QueryWeight and Scorer
"levels"? It seems like we should pick one level and do it there,
consistently. Most queries seem to only implement the QueryWeight one
and often simply throw UOE in the Scorer's explain, but eg PhraseQuery
implements in both places.
(BTW: I'll be offline for approx the next 36 hours or so!)