Solr
  1. Solr
  2. SOLR-6160

Exception possible with group.facet, range faceting, with distributed search

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.7.1
    • Fix Version/s: 4.9, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      I'm seeing a hard to reproduce bug when the following conditions are true:

      • Distributed search
      • group=true with group.field=xxx and group.facet=true
      • facet=true with facet.field and facet.range

      On a sharded request (isShard=true, distrib=false) that has requestPurpose=GET_FIELDS, sometimes facet=true but sometimes it isn't. Apparently, sometimes the earlier GET_FACETS phase satisfies the faceting alone and sometimes more is done in GET_FIELDS. So if facet=true on such a request and facet.range is set (or perhaps facet.query), then the bug will hit. Specifically both the facet.range and facet.query logic will conditionally call SimpleFacets.getGroupedFacetQueryCount, and both will conditionally do so when they detect that "group.field" has been set. BUT, for a GET_FIELDS shard request, the "group" parameter flag is explicitly removed from the request by StoredFieldsShardRequestFactory, effectively disabling grouping. So SimpleFacets.getGroupedFacetQueryCount will throw an error. The error is that "group.field" hasn't been set which technically isn't true.

      It's 100% reproducible on my customer's system. Reproducing it is tricky because it's not going to happen if the faceting logic doesn't happen on GET_FIELDS, which doesn't seem to happen often. I found that for a test query if I changed the facet.limit to a sufficiently high number then it trips, but if it's low then it doesn't. I presume this has something to do with refining faceting counts such that a higher facet.limit increases the chance that the coordinating node (what do we call that?) will need to ask a shard for more counts beyond which was provided on the initial GET_FACETS phase.

      If anyone has pointers on how to reproduce this (such as in a test!), then that would help.

      Even though I don't have 100% understanding of the bug and have yet to reproduce it with test data, it seems to me the bug might be as simple as having SimpleFacets.getGroupedFacetQueryCount retrieve the group.field parameter directly from parameters instead of possibly failing to find it in rb.GetGroupingSpec() (because "group" wasn't set). After all, that is how the callers of this method determine wether or not they want to get a grouped query count.

        Activity

        Hide
        David Smiley added a comment -

        Here's the patch against 4.7. Two lines were replaced with one. Solr's tests pass and I would be surprised if this doesn't fix it on the production in system but it'll take time to deploy the fix to see.

        One thing about this simple patch I don't like is that group.field is having an effect even though group=true isn't on the request. That actually was the case before this patch as well since both callers of the method I changed simply looked for the group.field param. This isn't a big deal; arguably the client shouldn't be sending them in the first place. But nonetheless, it would be nice if a sharded request could have its parameters stripped (by prefix) so as not to have side-effects like this. Doing so would then raise the question of how, in a GET_FIELDS phase, a facet query could get a grouped count even though we don't actually want to group the results? <<shrug>> Thoughts welcome.

        Show
        David Smiley added a comment - Here's the patch against 4.7. Two lines were replaced with one. Solr's tests pass and I would be surprised if this doesn't fix it on the production in system but it'll take time to deploy the fix to see. One thing about this simple patch I don't like is that group.field is having an effect even though group=true isn't on the request. That actually was the case before this patch as well since both callers of the method I changed simply looked for the group.field param. This isn't a big deal; arguably the client shouldn't be sending them in the first place. But nonetheless, it would be nice if a sharded request could have its parameters stripped (by prefix) so as not to have side-effects like this. Doing so would then raise the question of how, in a GET_FIELDS phase, a facet query could get a grouped count even though we don't actually want to group the results? <<shrug>> Thoughts welcome.
        Hide
        Peter Ciuffetti added a comment -

        Confirming that this patch has resolved the issue in the SolrCloud environment where it was initially discovered. Thanks! Let me know if you want me to post any query logs or other data from the environment.

        Show
        Peter Ciuffetti added a comment - Confirming that this patch has resolved the issue in the SolrCloud environment where it was initially discovered. Thanks! Let me know if you want me to post any query logs or other data from the environment.
        Hide
        David Smiley added a comment -

        That confirmation works for me. I'll commit to trunk, 4x, and the 4.9 release branch momentarily.

        Show
        David Smiley added a comment - That confirmation works for me. I'll commit to trunk, 4x, and the 4.9 release branch momentarily.
        Hide
        ASF subversion and git services added a comment -

        Commit 1603310 from David Smiley in branch 'dev/trunk'
        [ https://svn.apache.org/r1603310 ]

        SOLR-6160: bugfix when facet query or range with group facets and distributed

        Show
        ASF subversion and git services added a comment - Commit 1603310 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1603310 ] SOLR-6160 : bugfix when facet query or range with group facets and distributed
        Hide
        ASF subversion and git services added a comment -

        Commit 1603312 from David Smiley in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1603312 ]

        SOLR-6160: bugfix when facet query or range with group facets and distributed

        Show
        ASF subversion and git services added a comment - Commit 1603312 from David Smiley in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1603312 ] SOLR-6160 : bugfix when facet query or range with group facets and distributed
        Hide
        ASF subversion and git services added a comment -

        Commit 1603313 from David Smiley in branch 'dev/branches/lucene_solr_4_9'
        [ https://svn.apache.org/r1603313 ]

        SOLR-6160: bugfix when facet query or range with group facets and distributed

        Show
        ASF subversion and git services added a comment - Commit 1603313 from David Smiley in branch 'dev/branches/lucene_solr_4_9' [ https://svn.apache.org/r1603313 ] SOLR-6160 : bugfix when facet query or range with group facets and distributed

          People

          • Assignee:
            David Smiley
            Reporter:
            David Smiley
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development