Solr
  1. Solr
  2. SOLR-7116

Facet refinement shard request should disable other faceting types

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1
    • Component/s: faceting
    • Labels:
      None

      Description

      While examining FacetComponent in the process of adding the new facet.heatmap faceting type, I observed that distributed shard refinement requests were built by copying the existing parameters and then modifying ones pertinent to facet.field or facet.pivot depending on the type of refinement requests (which in turn only happens some of the time, not too often). Those are the only types of faceting that have a refinement phase. These refinement requests should not have facet.query, facet.date, facet.range, facet.interval, or facet.heatpmap since they don't participate in refinement... and furthermore, facet.field and facet.pivot have their own dedicated refinement requests and so a facet.field request should not have options for facet.pivot. But this isn't taken care of, just facet.query is removed.

        Activity

        Hide
        David Smiley added a comment -

        The attached patch fixes the problem. When a refinement request is being created for facet.field or facet.pivot, it removes the main facet types first now with a little utility method, before subsequently adding the parameters it needs. I also saw that in the event there is a need to do a facet.pivot refinement but not a facet.field refinement, it seems to create and add a new shard request for facet.field anyway (with essentially no search options), so I fixed that.

        To observe the problem, I locally hacked TestDistributedSearch and added debug=track to observe what happens when I combine a facet.field request that needs refinement with range faceting. And indeed the refinement request did the range faceting pointlessly. I guess I could add a proper test for this by examining debug=track.

        As an aside, I can't help but feel FacetComponent needs a re-architecture, along with its tests such that these things can be combined without the code for each being intertwined, and to enable random testing of different types of faceting together instead of testing these purely in isolation.

        Chris Hostetter (Unused) would you mind looking at the patch? It's small.

        Show
        David Smiley added a comment - The attached patch fixes the problem. When a refinement request is being created for facet.field or facet.pivot, it removes the main facet types first now with a little utility method, before subsequently adding the parameters it needs. I also saw that in the event there is a need to do a facet.pivot refinement but not a facet.field refinement, it seems to create and add a new shard request for facet.field anyway (with essentially no search options), so I fixed that. To observe the problem, I locally hacked TestDistributedSearch and added debug=track to observe what happens when I combine a facet.field request that needs refinement with range faceting. And indeed the refinement request did the range faceting pointlessly. I guess I could add a proper test for this by examining debug=track. As an aside, I can't help but feel FacetComponent needs a re-architecture, along with its tests such that these things can be combined without the code for each being intertwined, and to enable random testing of different types of faceting together instead of testing these purely in isolation. Chris Hostetter (Unused) would you mind looking at the patch? It's small.
        Hide
        Hoss Man added a comment -

        skimming the patch looks ok to me ... one point to clarify...

        ... furthermore, facet.field and facet.pivot have their own dedicated refinement requests and so a facet.field request should not have options for facet.pivot. ...

        that's not entirely true. The point behind a lot of the shard request logic is to try and share requests to the shards whenever possible. just like we piggy back facet facet refinement on PURPOSE_GET_FIELDS if we can find one, we also try to combine facet.field refinement with facet.pivot refinement if both are in use.

        so for example, if you have a request with facet.field=XX&facet.pivot=YY,ZZ then you should see one request each shard with both facet.field=XX and facet.pivot=YY refinements requested at the same time – then once YY has been refined, a second request to every shard with facet.pivot=YY,ZZ refinements. (ie: only 2 total refinement request to each shard, not 3)

        I think from looking at your patch, this should still be true? did you try this in your manual tests?

        I guess I could add a proper test for this by examining debug=track.

        I've thought about trying to do whitebox tests to verify that a single cloud request triggers the expected shard requests, but i could never figure out a decent idea for how to do it w/o a lot of complex "mocking" up of solr ... if you've got an idea in mind, please open an issue with your thoughts.

        Show
        Hoss Man added a comment - skimming the patch looks ok to me ... one point to clarify... ... furthermore, facet.field and facet.pivot have their own dedicated refinement requests and so a facet.field request should not have options for facet.pivot. ... that's not entirely true. The point behind a lot of the shard request logic is to try and share requests to the shards whenever possible. just like we piggy back facet facet refinement on PURPOSE_GET_FIELDS if we can find one, we also try to combine facet.field refinement with facet.pivot refinement if both are in use. so for example, if you have a request with facet.field=XX&facet.pivot=YY,ZZ then you should see one request each shard with both facet.field=XX and facet.pivot=YY refinements requested at the same time – then once YY has been refined, a second request to every shard with facet.pivot=YY,ZZ refinements. (ie: only 2 total refinement request to each shard, not 3) I think from looking at your patch, this should still be true? did you try this in your manual tests? I guess I could add a proper test for this by examining debug=track. I've thought about trying to do whitebox tests to verify that a single cloud request triggers the expected shard requests, but i could never figure out a decent idea for how to do it w/o a lot of complex "mocking" up of solr ... if you've got an idea in mind, please open an issue with your thoughts.
        Hide
        David Smiley added a comment -

        I think from looking at your patch, this should still be true? did you try this in your manual tests?

        I did not; and I don't believe it will do what you say for the refinement phase because (a) the code declared that it doesn't do this for refinement requests – see handleResponses() – an excerpt is below. and (b) my reading of the code for distributedProcess() shows new ShardRequest happening and it's only used for facet.field refinement, and enqueuePivotFacetShardRequests() creates its own as well just for pivot refinement. My patch doesn't change that.

              // at present PURPOSE_REFINE_FACETS and PURPOSE_REFINE_PIVOT_FACETS
              // don't co-exist in individual requests, but don't assume that
              // will always be the case
        

        That said; I'm sure you know this confusing code more than I do and I would not be too surprised to find out my understanding is wrong.

        p.s. I'm on vacation this week so sorry for any delayed replies

        Show
        David Smiley added a comment - I think from looking at your patch, this should still be true? did you try this in your manual tests? I did not; and I don't believe it will do what you say for the refinement phase because (a) the code declared that it doesn't do this for refinement requests – see handleResponses() – an excerpt is below. and (b) my reading of the code for distributedProcess() shows new ShardRequest happening and it's only used for facet.field refinement, and enqueuePivotFacetShardRequests() creates its own as well just for pivot refinement. My patch doesn't change that. // at present PURPOSE_REFINE_FACETS and PURPOSE_REFINE_PIVOT_FACETS // don't co-exist in individual requests, but don't assume that // will always be the case That said; I'm sure you know this confusing code more than I do and I would not be too surprised to find out my understanding is wrong. p.s. I'm on vacation this week so sorry for any delayed replies
        Hide
        Hoss Man added a comment -

        I'm almost certainly wrong.

        (I have much more faith in the correctness of comments i put in the code when i wrote it, then in my own memory of the code)

        Show
        Hoss Man added a comment - I'm almost certainly wrong. (I have much more faith in the correctness of comments i put in the code when i wrote it, then in my own memory of the code)
        Hide
        David Smiley added a comment -

        Okay. Then I plan to commit this ~Tuesday unless you have feelings to the contrary (e.g. it's not ready, needs more testing).

        CHANGES.txt proposed text:

        SOLR-7116: Distributed facet refinement requests would needlessly compute other types of faceting that have already been computed. (David Smiley, Hossman)

        Show
        David Smiley added a comment - Okay. Then I plan to commit this ~Tuesday unless you have feelings to the contrary (e.g. it's not ready, needs more testing). CHANGES.txt proposed text: SOLR-7116 : Distributed facet refinement requests would needlessly compute other types of faceting that have already been computed. (David Smiley, Hossman)
        Hide
        ASF subversion and git services added a comment -

        Commit 1662167 from David Smiley in branch 'dev/trunk'
        [ https://svn.apache.org/r1662167 ]

        SOLR-7116: Distrib facet refinement shouldn't re-compute other facet types

        Show
        ASF subversion and git services added a comment - Commit 1662167 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1662167 ] SOLR-7116 : Distrib facet refinement shouldn't re-compute other facet types
        Hide
        ASF subversion and git services added a comment -

        Commit 1662168 from David Smiley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1662168 ]

        SOLR-7116: Distrib facet refinement shouldn't re-compute other facet types

        Show
        ASF subversion and git services added a comment - Commit 1662168 from David Smiley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1662168 ] SOLR-7116 : Distrib facet refinement shouldn't re-compute other facet types
        Hide
        David Smiley added a comment -

        FYI the committed version contains some small refactoring for code clarity; there is no real change.

        Show
        David Smiley added a comment - FYI the committed version contains some small refactoring for code clarity; there is no real change.
        Hide
        Shalin Shekhar Mangar added a comment -

        FYI David, I'm working on SOLR-7147 where you can track individual shard requests to assert in your tests.

        Show
        Shalin Shekhar Mangar added a comment - FYI David, I'm working on SOLR-7147 where you can track individual shard requests to assert in your tests.
        Hide
        David Smiley added a comment -

        Thanks for the FYI Shalin. I'll look into adding a test for this issue at some point.

        Show
        David Smiley added a comment - Thanks for the FYI Shalin. I'll look into adding a test for this issue at some point.
        Hide
        Timothy Potter added a comment -

        Bulk close after 5.1 release

        Show
        Timothy Potter added a comment - Bulk close after 5.1 release

          People

          • Assignee:
            David Smiley
            Reporter:
            David Smiley
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development