Thank you for all the feedback Shai! Responses:
I think that DrillSideways can take a DrillDownQuery (once we finish with
DrillSideways will still need to do its own thing (build the BQ with
minShouldMatch), but it should just take a DDQ.
In .search(), just set minShouldMatch to 1 if (drillDownQueries.size() == 1)? It reads simpler...
Also, why do you need to add a fake Query? I understand the rewrite will eliminate BQ and return TQ, but what's the harm?
Isn't minShouldMatch=1 in that case similar to TQ?
If we don't add the fake query then BQ of a single term will just
rewrite itself to that single term, so we won't find our scorer.
Really we should make a specialized collector for this case (I added a
TODO) because you should just use query foobar OR drill-down, and then
check only the drill-down scorer to see if it matched.
But I think we can do that later (it's an opto).
Extract dims.size() to a variable so it's not executed in every loop?
I think you can drop the if (cp.length > 0)? It doesn't make sense for someone to pass an empty CP. Also, you can assert on that in .addDrillDown()
BTW, I noticed that you test that in DrillSidewaysCollector ctor too.
OK I'll move the check into .addDrillDown, but I think it should be a
real check (not assert).
I wonder if we made 'dims' LinkedHashSet it would perform better than these contains() (in .addDrillDown), get. Then you could just do dims.get(fr.cp.components). I didn't try that in code, so not sure if you can get its index...
Good idea, I think LinkedHashMap should work. I'll try that...
Also, I think we could simplify things if DrillSideways worked like this:
Either exposed a .getQuery() method, or was itself a Query (like DDQ).
Either exposed a .getCollector() method (returning DrillSidewaysCollector) or if it was a Query, you'd just initialize a DrillSidewaysCollector (not a big deal, user-wise).
The collector's getFacetResults() would do the "merging" work that I see in .search()
Won't need DrillSidewaysResult, which today wrap a List<FacetResult> and TopDocs. Someone could MultiCollector.wrap(topDocsCollector, sidewaysCollector)? Just like w/ facets?
Won't need the multitude of search() methods. Again, someone could wrap TopDocsCollector, CachingCollector, TopFieldsCollector...
Alas, it's not so simple: you can't use
MultiCollector.wrap(topDocsCollector, sidewaysCollector) because the
BooleanQuery (DrillSidewaysQuery if we do that) hits too many results
(hits + near-misses), and this is why it needs its own collector.
All other collectors you may want for your search (grouping, joining,
etc) need to be MultiCollector.wrap'd and then passed to
So I'm not sure what value it is to provide the Query/Collector
externally: all you can really do with them is search on the query,
collecting with the collector.
I'm also torn about "opening up" the Query/Collector because we may
change the impl on how the near-miss counts are collected. EG Solr
makes bit sets and does multiple passes to aggregate the near-misses.
In DrillSidewaysCollector ctor:
if (drillSidewaysRequest == null) – that means the user asked to drill-down on some CPs for dim X, but not requested to count it, right?
Do we must throw an exception? Perhaps we can just drop the relevant Query clause? Although, it's not very expected that a user would do that ... so perhaps keep the code for simplicity.
Or, we could just return null as the FacetResult for that dimension?
The thing is, I suspect it's unusual that the app would want this, and
then in those cases that they do, they could drill-down on their main
query (ie, pass DrillDownQuery as the Query to DrillSideways) and then
we won't count the sideways counts for it.
Instead of doing Collections.singletonList you can just pass the single FacetRequest to the vararg ctor.
If you feel like it, we can optimize FacetSearchParams' vararg ctors to initialize a singletonList if facetRequests.length == 1.
I don't think we need to ... looks like we use Arrays.asList.
exactCount = Math.max(2, dims.size()); – maybe add a comment why '2'?
Why does Scorer.getChildren() return a Collection and not List? We used to have that in IR.listCommits while in practice it was always a List. Can we fix Scorer?
I looked at all Scorer.getChildren() impls and they either return a List (ArrayList in most cases) or Collections.singleton (which is a Set). So it's indeed dangerous to assume it's a List, but I think we should just fix Scorer?
I don't think we can expect Scorer.getChildren to return a List,
matching the order of the sub-clauses, in general. Ie, it will depend
on each query: sometimes a Scorer can be dropped, if that clause's
scorer was null, or the Scorers can be reordered (eg,
ConjunctionScorer reorders by expected freq i think).
What do you mean by "// nocommit fragile: need tracker somehow..."? What's tracker?
Well, we cannot just cast to List and assume number of sub-scorers is
the same as number of clauses. EG if the user's main query returns a
null Scorer because it matches no hits, then BQ will only return a
length 1 list to us.
So to properly handle this I think we need a wrapper Query class (this
original idea came from selckin I think), whose purpose is to track
which Scorer instance was created for our "near-miss" BQ.
Can you add some documentation to the 'if-else'?