I adding grouping queries to the nightly benchmarks
(http://people.apache.org/~mikemccand/lucenebench) – see
TermGroup100/10K/1M. The "F" annotation is the day grouping queries
Those queries are the same queries running as TermQuery, just with
grouping turned on on 3 randomly generated fields, with 100, 10,000
and 1 million unique values. So we can gauge the perf hit by
comparing to TermQuery each night.
I use the CachingCollector.
First off, I'm impressed that the perf hit for grouping is not too
I had expected we'd pay a bigger perf hit!
Second, there more unique groups you have, the slower grouping gets,
but that multiplier really isn't so bad – the 1M unique groups case
is only 10.6% slower than the 100 unique groups case.
Remember, though, that these groups are randomly generated
full-unicode strings, so real data could very well produce different
Third, and this is insanity, the addition of grouping caused other
unexpected changes. Most horribly, SpanNearQuery slowed down
while other queries seem to get a bit faster. I think this is
[frustratingly!] due to hotspot making different decisions about which
code to optimize/inline.
Similarly strange, when I added sorting (TermQuery sorting by title
and date/time, "E" annotation in all graphs), I saw the variance in
the unsorted TermQuery performance drop substantially. I'm pretty
sure this wide variance was due to hotspot's erratic decision making,
but somehow the addition of sorting, while not change TermQuery's mean
QPS, caused hotspot to at least be somewhat more consistent in how it
compiled the code. Maybe as we add more and more diverse queries to
the benchmark we'll see hotspot behave more "reasonably"....