Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently when dealing with fields with high cardinality the facet module offers two implementations (unique, hll) that give approximate results. There is one corner case where a distributed search against a high cardinality field should still be able to efficiently provide an exact result, that is when the shards are known to contain disjoint values i.e. there are duplicates within a shard, but no value exists on more than 1 shard.
That happens to be the case in the collection I have, but this feels to me like a very niche use case. Is this functionality too niche for inclusion into the Facet module?
I attach a naive (untested) example implementation. It could be made slightly more efficient if SlotAcc implementations that didn't populate the first 100 values were used (or if this behaviour was made configurable, perhaps via the FacetContext?).
Slightly off topic, but the documentation currently says of unique "Beyond 100 values it yields not exact estimate". My understanding is that this is actually only true when doing distributed facetting, and that it is exact for the non-distrubuted case.
UniqueAgg calculates sumUnique, but does not appear to actually use it.
Attachments
Attachments
Issue Links
- is superceded by
-
SOLR-14518 Add support for partitioned unique agg to JSON facets
- Open