Details
-
Improvement
-
Status: Closed
-
Trivial
-
Resolution: Fixed
-
main (10.0)
-
None
Description
The "relatedness" facet function supports the concept of foreground_popularity and background_popularity – i.e., the cardinality of the intersection of bucket domain with the foreground and background sets (respectively), each normalized with respect to background set cardinality.
The logic appears to be:
- To provide clients with context of computed relatedness values
- To preemptively (optionally) screen out "noise" from low-frequency terms via the min_popularity function parameter.
For both purposes, popularity values are currently rounded to 5 digits.
This issue proposes that although rounding to 5 digits makes sense for the first case (providing context to clients), this arbitrary truncation does not make sense as currently implemented for internally evaluating threshold pop values for bucket inclusion.
Consider the case of a high-cardinality field with a relatively large background set and a selective foreground set. For |background_set| = 2,000,000 and a foreground set of cardinality 9, even a bucket with a domain that exactly matches the foreground set would be screened out, for any explicit setting of min_popularity.
This behavior is due to where the rounding takes place (internally, upon initial computeDerivedValues()). It is further problematic that RelatednessAgg will currently accept min_popularity < 0.00001, which would be guaranteed to exclude all buckets.
Attachments
Issue Links
- links to