I'll need to check
LUCENE-3129 for equivalence with PerParentLimitQuery. It's certainly a central part of what I typically deploy for nested queries - pass 1 is usually a NestedDocumentQuery to get the best parents and pass 2 uses PerParentLimitQuery to get the best children for these best parents.
Hmm, so I wonder if we could do this in one pass? Ie, like grouping,
if you indexed your docs as blocks, you can use the faster single-pass
collector; but if you didn't, you can use the more general but slower
and more-RAM-consuming two pass collector.
It seems like we should be able to do something similar with joins,
somehow... ie Solr's join impl is a start at the "fully general"
But I agree the "join child to parent" and then "grouping of child
docs" go hand in hand for searching...
What do you do for facet counting in these apps...? Post-grouping
faceting also ties in here.
Of course some apps can simply fetch ALL children for the top parents but in some cases summarising children is required
(note: this is potentially a great solution for performance issues on highlighting big docs e.g. entire books).
I think it'd be compelling to index book/articles with each
page/section/chapter being a new doc, and then group them under their
I haven't benchmarked nextSetBit vs the existing "rewind" implementation but I imagine it may be quicker.
I think it should be much faster – obs.nextSetBit looks heavily
optimized, since it can operate a word at a time. Though, if the
groups are smallish, so that nextSetBit is often maybe 2 or 3 bits
away, I'm not sure it'd be faster...
Parent- followed-by-children seems more natural from a user's point of view however.
But is it really so bad to ask the app to put parent doc last?
I mean, the docs have to be indexed w/ the new doc block APIs in IW
anyway, which will often be eg a List<Document>, at which point
putting parent last seems a minor imposition?
Since this is an expert API I think it's OK to put [minor] impositions
on its usage if this can simplify the impl / make it faster / less
risky. That said, I'm not yet sure on the impl (single pass query +
collector vs generic two-pass join that solr now has), so it's
probably premature to worry about this...
I guess you could always keep the parent-then-child insertion order but flip the bitset (then cache) for query execution if that was faster.
True but this adds some hair into the impl (we must also "flip" coming
back from nextSetBit)...
Benchmarking rewind vs nextSetbit vs flip then nextSetBit would reveal all.
True, though it'd be best to do this in the context of the actual join impl...