Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16144

Don't internally round [foreground|background]_popularity values in RelatednessAgg

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • main (10.0)
    • 9.1
    • Facet Module
    • None

    Description

      The "relatedness" facet function supports the concept of foreground_popularity and background_popularity – i.e., the cardinality of the intersection of bucket domain with the foreground and background sets (respectively), each normalized with respect to background set cardinality.

      The logic appears to be:

      1. To provide clients with context of computed relatedness values
      2. To preemptively (optionally) screen out "noise" from low-frequency terms via the min_popularity function parameter.

      For both purposes, popularity values are currently rounded to 5 digits.

      This issue proposes that although rounding to 5 digits makes sense for the first case (providing context to clients), this arbitrary truncation does not make sense as currently implemented for internally evaluating threshold pop values for bucket inclusion.

      Consider the case of a high-cardinality field with a relatively large background set and a selective foreground set. For |background_set| = 2,000,000 and a foreground set of cardinality 9, even a bucket with a domain that exactly matches the foreground set would be screened out, for any explicit setting of min_popularity.

      This behavior is due to where the rounding takes place (internally, upon initial computeDerivedValues()). It is further problematic that RelatednessAgg will currently accept min_popularity < 0.00001, which would be guaranteed to exclude all buckets.

      Attachments

        Issue Links

          Activity

            People

              magibney Michael Gibney
              magibney Michael Gibney
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h