Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10480

Offset does not allow for full pagination in JSON Facet API

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.4.1
    • Fix Version/s: 6.6
    • Component/s: Facet Module, SolrCloud
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      I have a SolrCloud cluster and when I use the JSON facet API to do term faceting like this:

      json.facet={"results":{"type": "terms", "field": "my_field", "limit": 100, "offset": 100, "numBuckets": true}}

      it does work correctly.
      However the numBuckets tells me in return that I have more than 6 millions buckets but as soon as I start to grow the offset value to browse these buckets, it doesn't return anything anymore (when I reach an offset of around 300).
      What is even weirder is that if I put a bigger limit, like 10'000, I can increase the offset until around 29'000 before it doesn't return anything.
      And the returned numBuckets doesn't change all the while.

      It is a big problem because we can't paginate till the end of the buckets.

      Might be related to SOLR-7452, I don't know...

        Issue Links

          Activity

          Hide
          varunthacker Varun Thacker added a comment -
          Show
          varunthacker Varun Thacker added a comment - I believe Karthik Ramachandran brought it up on the mailing list as well : http://solr.markmail.org/search/JSON+facet+bucket+list+not+correct+with+sharded+query
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          I think this is probably caused by a bad estimate for numBuckets... meaning that if you keep paging, you should get all the values (we're just wrong about the number of values).

          Show
          yseeley@gmail.com Yonik Seeley added a comment - I think this is probably caused by a bad estimate for numBuckets... meaning that if you keep paging, you should get all the values (we're just wrong about the number of values).
          Hide
          The_matrixme Maxime Darçot added a comment -

          This is definitely an issue about the pagination and not about the estimate of numBuckets. Indeed if for one faceted query I put a limit of 100 I can go up to an offset of around 300 before the query doesn't return anything anymore (which would mean that there are around 300 buckets), while with the same query but a limit of 10'000, I can go up to around 30'000 for the offset (which means that there are definitely a lot more than 300 buckets).

          Show
          The_matrixme Maxime Darçot added a comment - This is definitely an issue about the pagination and not about the estimate of numBuckets. Indeed if for one faceted query I put a limit of 100 I can go up to an offset of around 300 before the query doesn't return anything anymore (which would mean that there are around 300 buckets), while with the same query but a limit of 10'000, I can go up to around 30'000 for the offset (which means that there are definitely a lot more than 300 buckets).
          Hide
          kramachandran@commvault.com Karthik Ramachandran added a comment -

          I agree with Maxime Darçot, in my case with shard query facet query does not return result with pagination. I have sample code to demo the issue. https://gist.github.com/mrkarthik/8848dfb54536df4a24103d6939b54f61

          Show
          kramachandran@commvault.com Karthik Ramachandran added a comment - I agree with Maxime Darçot , in my case with shard query facet query does not return result with pagination. I have sample code to demo the issue. https://gist.github.com/mrkarthik/8848dfb54536df4a24103d6939b54f61
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          OK, I was able to reproduce this issue and am looking for the root cause.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - OK, I was able to reproduce this issue and am looking for the root cause.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment - - edited

          OK, found and patched the issue.

          As an aside, paging with an offset with distributed faceting has the same issue as distributed search. To return buckets N though N+10, each shard returns 0 through N+10, the merger sorts the merged bucket list, and then finally N through N+10 is returned to the client.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - - edited OK, found and patched the issue. As an aside, paging with an offset with distributed faceting has the same issue as distributed search. To return buckets N though N+10, each shard returns 0 through N+10, the merger sorts the merged bucket list, and then finally N through N+10 is returned to the client.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1f7777693769bad1cd8fc40b339d00c43f16f9d1 in lucene-solr's branch refs/heads/master from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1f77776 ]

          SOLR-10480: fix offset param handling in JSON Facet API

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1f7777693769bad1cd8fc40b339d00c43f16f9d1 in lucene-solr's branch refs/heads/master from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1f77776 ] SOLR-10480 : fix offset param handling in JSON Facet API
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b90bfaba1f065598033b60f0ba5ffaa40053eb42 in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b90bfab ]

          SOLR-10480: fix offset param handling in JSON Facet API

          Show
          jira-bot ASF subversion and git services added a comment - Commit b90bfaba1f065598033b60f0ba5ffaa40053eb42 in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b90bfab ] SOLR-10480 : fix offset param handling in JSON Facet API
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Committed the fix. Thanks for the bug reports!

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Committed the fix. Thanks for the bug reports!
          Hide
          The_matrixme Maxime Darçot added a comment -

          Well, thanks for the fix. Looking forward to use the version 6.6.

          Show
          The_matrixme Maxime Darçot added a comment - Well, thanks for the fix. Looking forward to use the version 6.6.

            People

            • Assignee:
              yseeley@gmail.com Yonik Seeley
              Reporter:
              The_matrixme Maxime Darçot
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development