Maybe rename actualQuery to fromQuery?
Yes, fromQuery makes more sense than actualQuery.
Why preComputedFromDocs...? Like if you were to cache something,
wouldn't you want cache the toSearcher's bitset instead?
This is in the case if your from query was cached and your toSearch's
bitset isn't, which is a likely scenario.
But caching the toSearcher's bitset is better off course when
possible. But this should be happen outside the JoinQuery, right?
Maybe rename JoinQueryWeight.joinResult to topLevelJoinResult,
I agree a much more descriptive name.
I wonder if we could make this a Filter instead, somehow? Ie, at
its core it converts a top-level bitset in the fromSearcher doc
space into the joined bitset in the toSearcher doc space. It
could even maybe just be a static method taking in fromBitset and
returning toBitset, which could operate per-segment on the
toSearcher side? (Separately: I wonder if JoinQuery should do
something with the scores of the fromQuery....? Not right now but
It just matches docs from one side to the to side. That is all... So static method / filter should be able to do the job.
I'm not sure, but if it is a query it might be able to one day encapsulate the joining in the Lucene query language?
Maybe reword that to state that all joined to/from docs must reside in the same shard?
I wonder if we could DocTermOrds instead? (Or,
FieldCache.DocTermsIndex or DocValues.BYTES_*, if we know
single-valued). This way we uninvert once (on init), and then doing
the join should be much faster since for each fromDocID we can lookup
the term(s) to join on.
I really like that idea! This already crossed my mind a few days ago
as an improvement to speedup the joining. Would be nice if the user can
choose between a more ram but faster variant and a less ram but slower variant.
I think we can just make two concrete JoinQuery impl that both have a different