Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9152

Change the default of facet.distrib.mco from false to true



    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None


      SOLR-8988 added a new query option facet.distrib.mco which when set to true would allow the use of facet.mincount=1 in cloud mode. The previous behavior, and current default, is that facet.mincount=0 when in cloud mode.

      What exactly would be changed?

      The default of facet.distrib.mco=false would be changed to facet.distrib.mco=true.

      When is this option effective?

      From the documentation,

       * If we are returning facet field counts, are sorting those facets by their count, and the minimum count to return is > 0,
       * then allow the use of facet.mincount = 1 in cloud mode. To enable this use facet.distrib.mco=true.
       * i.e. If the following three conditions are met in cloud mode: facet.sort=count, facet.limit > 0, facet.mincount > 0.
       * Then use facet.mincount=1.
       * Previously and by default facet.mincount will be explicitly set to 0 when in cloud mode for this condition.
       * In SOLR-8599 and SOLR-8988, significant performance increase has been seen when enabling this optimization.
       * Note: enabling this flag has no effect when the conditions above are not met. For those other cases the default behavior is sufficient.

      What is the result of turning this option on?

      When facet.distrib.mco=true is used, and the conditions above are met, then when Solr is sending requests off to the various shards it will include facet.mincount=1. The result of this is that only terms with a count > 0 will be considered when processing the request for that shard. This can result in a significant performance gain when the field has high cardinality and the matching docset is relatively small because terms with 0 matches will not be considered.

      As shown in SOLR-8988, the runtime of a single query was reduced from 20 seconds to less than 1 second.

      Can this change result in worse performance?

      The current thinking is no, worse performance won't be experienced even under non-optimal scenarios. From the comments in SOLR-8988,

      Consider you asked for up to 10 terms from shardA with mincount=1 but you received only 5 terms back. In this case you know, definitively, that a term seen in the response from shardB but not in the response from shardA could have at most a count of 0 in shardA. If it had any other count in shardA then it would have been returned in the response from shardA.

      Also, if you asked for up to 10 terms from shardA with mincount=1 and you get back a response with 10 terms having a count >= 1 then the response is identical to the one you'd have received if mincount=0.

      Because of this, there isn't a scenario where the response would result in more work than would have been required if mincount=0. For this reason, the decrease in required work when mincount=1 is always either a moot point or a net win.

      The belief here is that it is safe to change the default of facet.distrib.mco such that facet.mincount=1 will be used when appropriate. The overall performance gain can be significant and there is no seen performance cost.


        Issue Links



              Unassigned Unassigned
              dpgove Dennis Gove
              1 Vote for this issue
              5 Start watching this issue