I attached a hacked patch... nowhere near committable, various tests
fail, etc... yet I think once we clean it up, the approach is viable.
I started from the patch like 2 iterations ago, and then fixed how the
MTQ BQ rewrite works so that instead of the two passes (first to
gather matching terms, second to create weight/scorers & run the BQ),
it now makes a single pass.
In that single pass it records which terms matched which segments, and
creates TermScorer for each.
After the single pass, once we've summed up the top level docFreq for
all terms, I go back and reset the weights for all the TermScorers,
sumSQ them, normalize, etc., and then create a FakeQuery object whose
only purpose is to remember the per-segment scorers and provide them
once .scorer(...) is called on each segment.
The big gain with this approach is you don't waste effort trying to
seek to non-existent terms in the sub readers. Normally the terms
cache would save you here, but, we never cache a miss and so when we
try to look that up again it's always a real (costly) seek.
With this approach we can disable using the terms cache entirely from
MTQ.rewrite, which is great.
I believe the patch works correctly, at least for this test, because
on my 10M wikipedia index it gets identical top N results as clean
trunk. Here're the perf gains:
Note that these gains already include the sizable gains from the
original patch, but the single pass approach makes further great
gains, especially eg on the prefix query.
I don't think we should couple this new patch w/ this issue... this
issue already has awesome gains with a fairly minor change...
I'll open a new issue.