Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11733

add an option make json.facet refinement more "optimistic" like facet.field/facet.pivot so that long tail have a change to bubble up

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Facet Module
    • None

    Description

      json.facet refinement is currently "pessimistic" by default. Specifically: "Long Tail" terms that may not be in the "top n" on every shard, but are in the "top n + overrequest" for at least 1 shard aren't getting refined and included in the aggregated response in some cases.

      This is different then the "optimistic" approach taken in the existing facet.field and facet.pivot refinement, that refines all known terms whose counts might be high enough to put them in the topN based on what's known about the lowest count returned by each shard in phase #1.

      A mitigating option that people with particular concerns about long tail terms can consider is to set a "high" value for the overrefine parameter – forcing Solr to refine more terms from phase#1 – but this is somewhat of a "brute force" workaround, since it doesn't take into account any known info about the results of each shard from phase#1.

      This issue tracks possible improvements that could be made to the faceting code to be more sophisticated.
       


      (NOTE: this Jira was originally filed as a bug report noting that json.facet refinement didn't seem to be working properly compared to facet.field refinement, and early comments are written in this mindset)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hossman Chris M. Hostetter
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: