Solr
  1. Solr
  2. SOLR-6154

SolrCloud: facet range option f.<field>.facet.mincount=1 omits buckets on response

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.5.1, 4.8.1
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Description

      Attached

      • PDF with instructions on how to replicate.
      • data.xml to replicate index

      The f.<field>.facet.mincount option on a distributed search gives inconsistent list of buckets on a range facet.

      Experiencing that some buckets are ignored when using the option "f.<field>.facet.mincount=1".

      The Solr logs do not indicate any error or warning during execution.
      The debug=true option and increasing the log levels to the FacetComponent do not provide any hints to the behaviour.

      Replicated the issue on both Solr 4.5.1 & 4.8.1.

      Example,

      Removing the f.<field>.facet.mincount=1 option gives the expected list of buckets for the 6 documents matched.

      <lst name="facet_ranges">
      <lst name="price">
      <lst name="counts">
      <int name="0.0">0</int>
      <int name="50.0">1</int>
      <int name="100.0">0</int>
      <int name="150.0">3</int>
      <int name="200.0">0</int>
      <int name="250.0">1</int>
      <int name="300.0">0</int>
      <int name="350.0">0</int>
      <int name="400.0">0</int>
      <int name="450.0">0</int>
      <int name="500.0">0</int>
      <int name="550.0">0</int>
      <int name="600.0">0</int>
      <int name="650.0">0</int>
      <int name="700.0">0</int>
      <int name="750.0">1</int>
      <int name="800.0">0</int>
      <int name="850.0">0</int>
      <int name="900.0">0</int>
      <int name="950.0">0</int>
      </lst>
      <float name="gap">50.0</float>
      <float name="start">0.0</float>
      <float name="end">1000.0</float>
      <int name="before">0</int>
      <int name="after">0</int>
      <int name="between">2</int>
      </lst>
      </lst>

      Using the f.<field>.facet.mincount=1 option removes the 0 count buckets but will also omit bucket <int name="250.0">

      <lst name="facet_ranges">
      <lst name="price">
      <lst name="counts">
      <int name="50.0">1</int>
      <int name="150.0">3</int>
      <int name="750.0">1</int>
      </lst>
      <float name="gap">50.0</float>
      <float name="start">0.0</float>
      <float name="end">1000.0</float>
      <int name="before">0</int>
      <int name="after">0</int>
      <int name="between">4</int>
      </lst>
      </lst>

      Resubmitting the query renders a different bucket list
      (May need to resubmit a couple times)

      <lst name="facet_ranges">
      <lst name="price">
      <lst name="counts">
      <int name="150.0">3</int>
      <int name="250.0">1</int>
      </lst>
      <float name="gap">50.0</float>
      <float name="start">0.0</float>
      <float name="end">1000.0</float>
      <int name="before">0</int>
      <int name="after">0</int>
      <int name="between">2</int>
      </lst>
      </lst>

      1. data.xml
        6 kB
        Ronald Matamoros
      2. HowToReplicate.pdf
        339 kB
        Ronald Matamoros

        Issue Links

          Activity

          Hide
          José Joaquín added a comment - - edited

          I'm also experiencing the same effect on Solr 4.7.1. In my case, it comes up when including two collections in the faceting-by-range query.

          When f.<field>.facet.mincount= 0, all the buckets are correctly returned.
          Otherwise only the entries from one of the collections are being returned.

          Show
          José Joaquín added a comment - - edited I'm also experiencing the same effect on Solr 4.7.1. In my case, it comes up when including two collections in the faceting-by-range query. When f.<field>.facet.mincount= 0, all the buckets are correctly returned. Otherwise only the entries from one of the collections are being returned.
          Hide
          Erick Erickson added a comment -

          Right, I've tracked this down to how lists are reconciled from the shard requests.

          Ronald:
          That was a great writeup, I just regret it took so long to make use of it!

          This will probably be fixed as part of SOLR-6187

          Show
          Erick Erickson added a comment - Right, I've tracked this down to how lists are reconciled from the shard requests. Ronald: That was a great writeup, I just regret it took so long to make use of it! This will probably be fixed as part of SOLR-6187
          Hide
          Erick Erickson added a comment -

          At least I'm fixing it at the same time.

          Show
          Erick Erickson added a comment - At least I'm fixing it at the same time.
          Hide
          Ronald Matamoros added a comment -

          Thanks Eric, I am sorry that I have not being able to contribute further on the ticket. Let me know if you want me to test anything on my side.

          Show
          Ronald Matamoros added a comment - Thanks Eric, I am sorry that I have not being able to contribute further on the ticket. Let me know if you want me to test anything on my side.
          Hide
          Hoss Man added a comment -

          Erick, sorry for the late reply.

          I haven't looked in depth at your patch for this issue or SOLR-6187, but in response to your question on the mailing list...

          The problem here is that it assumes that the first list in has all the counts that ever will be reported from any shard.

          You are almost certainly correct, it's very probably that the logic for distributed range faceting isn't taking into account the possibility of mincount suppressing buckets from one or more shards.

          the general strategy for dealing with this in field faceting & pivot faceting (which i suspect is what you already doing in your patch) is to have the coordinator node modify the mincount params when it sends the shard requests to force mincount=0, to ensure it gets a response for every bucket from every shard, then filter the response based on the (original) combined mincount.

          "not recommended idea"

          I say "modify" because one of the strategies taken with field/pivot faceting when using "facet.sort=index" is this...

          // we're sorting by index order.
          // if minCount==0, we should always be able to get accurate results w/o
          // over-requesting or refining
          // if minCount==1, we should be able to get accurate results w/o
          // over-requesting, but we'll need to refine
          // if minCount==n (>1), we can set the initialMincount to
          // minCount/nShards, rounded up.
          // ...
          

          there is no sorting or "top-n" aspect to facet.range, so the idea of "over-requesting" doesn't apply – but the minCount/nShards idea still applies. if the user requests a minCount of "10" and there are 3 shards, then you could set f.foo.facet.mincount=4 for the shard requests – because unless at lest one shard responds back with a count higher then "4", you'll never be able to satisfy the original mincount=10 ... HOWEVER: using this strategy requires "refinement" requests, which we currently avoid in range faceting.

          i would not advise going with the refinement approach described above (hence the panel box labeling it not-recommended) because i think the single pass approach of range faceting right now is probably better for most common cases – we just need to force mincount=0 on hte shard requests – but i wanted to post it for completeness in case i'm missing something and you think it's a really good idea

          Show
          Hoss Man added a comment - Erick, sorry for the late reply. I haven't looked in depth at your patch for this issue or SOLR-6187 , but in response to your question on the mailing list... The problem here is that it assumes that the first list in has all the counts that ever will be reported from any shard. You are almost certainly correct, it's very probably that the logic for distributed range faceting isn't taking into account the possibility of mincount suppressing buckets from one or more shards. the general strategy for dealing with this in field faceting & pivot faceting (which i suspect is what you already doing in your patch) is to have the coordinator node modify the mincount params when it sends the shard requests to force mincount=0, to ensure it gets a response for every bucket from every shard, then filter the response based on the (original) combined mincount. "not recommended idea" I say "modify" because one of the strategies taken with field/pivot faceting when using "facet.sort=index" is this... // we're sorting by index order. // if minCount==0, we should always be able to get accurate results w/o // over-requesting or refining // if minCount==1, we should be able to get accurate results w/o // over-requesting, but we'll need to refine // if minCount==n (>1), we can set the initialMincount to // minCount/nShards, rounded up. // ... there is no sorting or "top-n" aspect to facet.range, so the idea of "over-requesting" doesn't apply – but the minCount/nShards idea still applies. if the user requests a minCount of "10" and there are 3 shards, then you could set f.foo.facet.mincount=4 for the shard requests – because unless at lest one shard responds back with a count higher then "4", you'll never be able to satisfy the original mincount=10 ... HOWEVER: using this strategy requires "refinement" requests, which we currently avoid in range faceting. i would not advise going with the refinement approach described above (hence the panel box labeling it not-recommended) because i think the single pass approach of range faceting right now is probably better for most common cases – we just need to force mincount=0 on hte shard requests – but i wanted to post it for completeness in case i'm missing something and you think it's a really good idea
          Hide
          Erick Erickson added a comment -

          Whew! I just committed this patch and.... it forces mincount to 0 for the shard requests, which is in-line with your comments I.....

          Show
          Erick Erickson added a comment - Whew! I just committed this patch and.... it forces mincount to 0 for the shard requests, which is in-line with your comments I.....
          Hide
          Erick Erickson added a comment -

          Fixed with the checkin for SOLR-6187

          Thanks again Ronald for a great problem writeup and reproducible test case!

          Show
          Erick Erickson added a comment - Fixed with the checkin for SOLR-6187 Thanks again Ronald for a great problem writeup and reproducible test case!
          Hide
          Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          Anshum Gupta added a comment - Bulk close after 5.0 release.

            People

            • Assignee:
              Erick Erickson
              Reporter:
              Ronald Matamoros
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development