Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9152

Change the default of facet.distrib.mco from false to true

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      SOLR-8988 added a new query option facet.distrib.mco which when set to true would allow the use of facet.mincount=1 in cloud mode. The previous behavior, and current default, is that facet.mincount=0 when in cloud mode.

      What exactly would be changed?

      The default of facet.distrib.mco=false would be changed to facet.distrib.mco=true.

      When is this option effective?

      From the documentation,

      /**
       * If we are returning facet field counts, are sorting those facets by their count, and the minimum count to return is > 0,
       * then allow the use of facet.mincount = 1 in cloud mode. To enable this use facet.distrib.mco=true.
       *
       * i.e. If the following three conditions are met in cloud mode: facet.sort=count, facet.limit > 0, facet.mincount > 0.
       * Then use facet.mincount=1.
       *
       * Previously and by default facet.mincount will be explicitly set to 0 when in cloud mode for this condition.
       * In SOLR-8599 and SOLR-8988, significant performance increase has been seen when enabling this optimization.
       *
       * Note: enabling this flag has no effect when the conditions above are not met. For those other cases the default behavior is sufficient.
       */
      

      What is the result of turning this option on?

      When facet.distrib.mco=true is used, and the conditions above are met, then when Solr is sending requests off to the various shards it will include facet.mincount=1. The result of this is that only terms with a count > 0 will be considered when processing the request for that shard. This can result in a significant performance gain when the field has high cardinality and the matching docset is relatively small because terms with 0 matches will not be considered.

      As shown in SOLR-8988, the runtime of a single query was reduced from 20 seconds to less than 1 second.

      Can this change result in worse performance?

      The current thinking is no, worse performance won't be experienced even under non-optimal scenarios. From the comments in SOLR-8988,

      Consider you asked for up to 10 terms from shardA with mincount=1 but you received only 5 terms back. In this case you know, definitively, that a term seen in the response from shardB but not in the response from shardA could have at most a count of 0 in shardA. If it had any other count in shardA then it would have been returned in the response from shardA.

      Also, if you asked for up to 10 terms from shardA with mincount=1 and you get back a response with 10 terms having a count >= 1 then the response is identical to the one you'd have received if mincount=0.

      Because of this, there isn't a scenario where the response would result in more work than would have been required if mincount=0. For this reason, the decrease in required work when mincount=1 is always either a moot point or a net win.

      The belief here is that it is safe to change the default of facet.distrib.mco such that facet.mincount=1 will be used when appropriate. The overall performance gain can be significant and there is no seen performance cost.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dpgove Dennis Gove
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: