Regarding the nocommit in ConjunctionScorer's ctor - I think the problem with moving doNext() to the end of the ctor is the order of the scorers that will be iterated on. I.e., currently the scorers are first advanced to their first doc, then sorted based on their docID, then advanced to the doc ID that all agree on, in the order of the sort.
If you move doNext() to the end of the method, then the scorers are sorted based on their docID, and then immediately re-sorted (based on their sparseness, which is a heuristic applied based on their first docID).
If I understand correctly, the problem with calling doNext() after the re-sort is this: assume you have 5 scorers, whose first docs are 1, 2, 3, 5, 5 respectively. Sorting leaves the array as is (or changes it to be in that order). If you call doNext() after array.sort, then you advance all the first scorers to 5 (or a larger doc ID they all agree on). However, if you do the re-sort, the order will be 5, 3, 2, 1, 5 and then if you call doNext(), it will stop immediately, since the first scorer's docs equals the last one. So you break an invariant, that after calling doNext() all scorers are on the same doc ID.
That's at least how I understand it. I hope I didn't write too much to explain the obvious (to others).
So I'll delete the nocommit tag.
Ahh – good explanation! Phew. This must be what leads to the two
test failures. So leave it where it is (maybe add a comment
explaining it cannot be moved down, lest anyone else gets tempted).
> Can you add an "assert scorer.docID() == -1" in IndexSearcher right when it gets the scorer?
OK, except we may delete it depending on the relaxing (below)...
> BS2.nextDoc is still needing to check if it's supposed to call initCountingSumScorer? Can we do this in ctor?
It can, if I delete add() and do all the work in the ctor.
I think we should (lacking a DISI.start).
I'll need to pass 3 arrays (Scorer, boolean /required/, boolean /prohibited/) though, or pass the list of Weights, Clauses and IndexReader. I'm not too fan of either. I check which is the lesser evil.
Hmm – I don't have a stronger lesser-of-evils preference. I suppose,
instead, we could add a "start" only to BS2, which BQ would call once
it's done adding? Or, simply call initCountingSumScorer from BQ and
don't do the check per nextDoc/advance.
> BooleanScorer2.score(Collector, int) makes me a bit nervous
I didn't think it's such a big deal since it's one 'if' before scoring starts. It's not like it's called for every score(). Do you think we should resolve it?
What specifically made me nervous that it was even necessary to add
this (having to conditionally next wasn't needed before) in the first
place. If you remove it, does something actually break? Like what
caused it to be added? (Because I want to go explore/understand
> Hmm... maybe we permit DISI.docID() to also return NO_MORE_DOCS after the DISI is created? EG we do this in ConjunctionScorer. (Or, maybe, any docID that's < the first real docID...hmmm.)
I do not fully follow you. By allowing to return -1, we already allow any DISI to return a doc ID that's < the first real doc ID. And allowing to return NO_MORE_DOCS looks strange to me .. I mean imagine a code which creates a DISI and calls doc() and gets back NO_MORE_DOCS .. that'd be strange, won't it?
I'm thinking it's OK to call docID before next, and that all we
require is that call return a docID less than its first real docID.
And, if you know up-front you have no docs, returning NO_MORE_DOCS up
front is also OK.
Ie this is a relaxation over "you must return -1 before nextDoc has
(EG I think the last patch would return NO_MORE_DOCS from docID() in
ConjunctionScorer if it determines in ctor that no docs match).