It'd be an option with different tradeoffs. Eg, it wouldn't require
the taxonomy index, since the main index handles label/ord resolving.
There are at least two possible approaches:
- On every reopen, build the seg -> global ord map, and then on
every collect, get the seg ord, map it to the global ord space,
and increment counts. This adds cost during reopen in proportion
to number of unique terms ...
- On every collect, increment counts based on the seg ords, and then
do a "merge" in the end just like distributed faceting does.
The first approach is much easier so I built a quick prototype using
that. The prototype does the counting, but it does NOT do the top K
facets gathering in the end, and it doesn't "know" parent/child ord
relationships, so there's tons more to do before this is real. I also
was unsure how to properly integrate it since the existing classes
seem to expect that you use a taxonomy index to resolve ords.
I ran a quick performance test. base = trunk except I disabled the
"compute top-K" in FacetsAccumulator to make the comparison fair; comp
= using the prototype collector in the patch:
I'm impressed that this approach is only ~24% slower in the worst
case! I think this means it's a good option to make available? Yes
it has downsides (NRT reopen more costly, small added RAM usage,
slightly slower faceting), but it's also simpler (no taxo index to