Solr
  1. Solr
  2. SOLR-3406

Support grouped range and query facets.

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA, 5.0
    • Component/s: None
    • Labels:
      None

      Description

      Need the ability to support grouped range and query facets. Grouped facet fields have already been implemented in SOLR-2898 but we still need the ability to compute grouped range and query facets.

      1. SOLR-3406.patch
        6 kB
        Martijn van Groningen
      2. SOLR-2898-backport.patch
        15 kB
        David Boychuck
      3. SOLR-3406.patch
        5 kB
        David Boychuck
      4. SOLR-3406.patch
        4 kB
        Martijn van Groningen

        Issue Links

          Activity

          Hide
          Martijn van Groningen added a comment -

          Committed to trunk and branch4x.

          Show
          Martijn van Groningen added a comment - Committed to trunk and branch4x.
          Hide
          Martijn van Groningen added a comment -

          Updated patch to also support facet.range parameter.

          Show
          Martijn van Groningen added a comment - Updated patch to also support facet.range parameter.
          Hide
          Martijn van Groningen added a comment -

          Sure. I think what is in here can be committed. The only thing that needs work is caching. Right now no when facet.query in combination with group.facet=true is used, caching doesn't take place. I think this can be fixed in a new issue that refers to this issue. In the meantime the patch in this issue can get committed.

          Show
          Martijn van Groningen added a comment - Sure. I think what is in here can be committed. The only thing that needs work is caching. Right now no when facet.query in combination with group.facet=true is used, caching doesn't take place. I think this can be fixed in a new issue that refers to this issue. In the meantime the patch in this issue can get committed.
          Hide
          Hoss Man added a comment -

          Martijn: can you triage this for 4.0? commit if you think it's ready, otherwise remove the fix version?

          Show
          Hoss Man added a comment - Martijn: can you triage this for 4.0? commit if you think it's ready, otherwise remove the fix version?
          Hide
          David Boychuck added a comment -

          will this be committed and released in the alpha?

          Show
          David Boychuck added a comment - will this be committed and released in the alpha?
          Hide
          Martijn van Groningen added a comment -

          Nope not really. A branch4x will be created first in Subversion and then a 4.0-alpha version is released (this will be an official release). If everything goes as planned this should happen within a month or 2.

          Show
          Martijn van Groningen added a comment - Nope not really. A branch4x will be created first in Subversion and then a 4.0-alpha version is released (this will be an official release). If everything goes as planned this should happen within a month or 2.
          Hide
          David Boychuck added a comment -

          Yes it would be for internal usage. Is there an ETA for the release of 4.0? I need the functionality provided in 4.0 for facet grouping but my company will not want to run an experimental build of Solr.

          Show
          David Boychuck added a comment - Yes it would be for internal usage. Is there an ETA for the release of 4.0? I need the functionality provided in 4.0 for facet grouping but my company will not want to run an experimental build of Solr.
          Hide
          Martijn van Groningen added a comment -

          Yes it can be rewritten, but why do you want to this? Is this for internal usage? Solr 3.6 was the major last 3.x release. The 3.x release line is now in maintenance mode.

          Show
          Martijn van Groningen added a comment - Yes it can be rewritten, but why do you want to this? Is this for internal usage? Solr 3.6 was the major last 3.x release. The 3.x release line is now in maintenance mode.
          Hide
          David Boychuck added a comment -

          Do you think TermGroupFacetCollector could be rewritten so that it is compatible with 3.6 ?

          Show
          David Boychuck added a comment - Do you think TermGroupFacetCollector could be rewritten so that it is compatible with 3.6 ?
          Hide
          Martijn van Groningen added a comment -

          Yep the TermGroupFacetCollector isn't backported. That is why SOLR-2898 was never backported.

          Show
          Martijn van Groningen added a comment - Yep the TermGroupFacetCollector isn't backported. That is why SOLR-2898 was never backported.
          Hide
          David Boychuck added a comment -

          I have attached a patch for the backport. It looks like the TermGroupFacetCollector will need to be re-written so that it is compatible with solr 3.6

          Show
          David Boychuck added a comment - I have attached a patch for the backport. It looks like the TermGroupFacetCollector will need to be re-written so that it is compatible with solr 3.6
          Hide
          David Boychuck added a comment -

          What would it take to get the changes in SOLR-2898 backported? I have tried but am not having any success.

          Show
          David Boychuck added a comment - What would it take to get the changes in SOLR-2898 backported? I have tried but am not having any success.
          Hide
          David Boychuck added a comment -

          So I would not be able to apply patch SOLR-3406.patch and SOLR-2898.patch to Solr 3.6 for facet.field and facet.query grouping?

          Show
          David Boychuck added a comment - So I would not be able to apply patch SOLR-3406 .patch and SOLR-2898 .patch to Solr 3.6 for facet.field and facet.query grouping?
          Hide
          Martijn van Groningen added a comment -

          Which patches would I have to apply to get this functionality to work in Solr 3.6

          Hmmm... The TermAllGroupsCollector class is also available in 3.6 (in the grouping contrib). I guess the logic you're doing in your patch can also be done in 3.6 code base. I think the attached patches will not apply, because of changes in trunk that have never been backported.

          Show
          Martijn van Groningen added a comment - Which patches would I have to apply to get this functionality to work in Solr 3.6 Hmmm... The TermAllGroupsCollector class is also available in 3.6 (in the grouping contrib). I guess the logic you're doing in your patch can also be done in 3.6 code base. I think the attached patches will not apply, because of changes in trunk that have never been backported.
          Hide
          David Boychuck added a comment -

          Adding jUnit test for testing facet.query ranged queries

          Show
          David Boychuck added a comment - Adding jUnit test for testing facet.query ranged queries
          Hide
          David Boychuck added a comment -

          Added another junit test

          Show
          David Boychuck added a comment - Added another junit test
          Hide
          David Boychuck added a comment -

          Which patches would I have to apply to get this functionality to work in Solr 3.6

          Show
          David Boychuck added a comment - Which patches would I have to apply to get this functionality to work in Solr 3.6
          Hide
          Martijn van Groningen added a comment -

          I was just a bit too late! Nice work.

          Show
          Martijn van Groningen added a comment - I was just a bit too late! Nice work.
          Hide
          Martijn van Groningen added a comment -

          Yeah it is a large code base

          I updated your patch. You are in the right direction. Inside the getGroupedFacetQueryCount method a query is executed that returns a group count. This count is put into the response.

          I modified your test changes as well and it the grouped query faceting seems to work in the test.

          Show
          Martijn van Groningen added a comment - Yeah it is a large code base I updated your patch. You are in the right direction. Inside the getGroupedFacetQueryCount method a query is executed that returns a group count. This count is put into the response. I modified your test changes as well and it the grouped query faceting seems to work in the test.
          Hide
          David Boychuck added a comment -

          I think i got it working once I actually did the search. I added this line

          searcher.search(new MatchAllDocsQuery(), base.getTopFilter(), collector);
          
          Show
          David Boychuck added a comment - I think i got it working once I actually did the search. I added this line searcher.search( new MatchAllDocsQuery(), base.getTopFilter(), collector);
          Hide
          David Boychuck added a comment -

          I attached it... maybe you can tell me if I'm going in the right direction. I am a bit overwhelmed by the codebase

          Show
          David Boychuck added a comment - I attached it... maybe you can tell me if I'm going in the right direction. I am a bit overwhelmed by the codebase
          Hide
          David Boychuck added a comment -

          yes but it's not doing what I expect. I will attach it.

          Show
          David Boychuck added a comment - yes but it's not doing what I expect. I will attach it.
          Hide
          Martijn van Groningen added a comment -

          Do you mean the TermGroupFacetCollector ?

          That collector is used for computing grouped facets for a field. A query facet is just a query that executed "inside" the main query and for this query the hit count is computed as if it is a facet. That is why I think TermAllGroupsCollector can be used to compute this hit count.

          Did you already have a chance to create some code? If so create a patch and attach it to this issue.

          Show
          Martijn van Groningen added a comment - Do you mean the TermGroupFacetCollector ? That collector is used for computing grouped facets for a field. A query facet is just a query that executed "inside" the main query and for this query the hit count is computed as if it is a facet. That is why I think TermAllGroupsCollector can be used to compute this hit count. Did you already have a chance to create some code? If so create a patch and attach it to this issue.
          Hide
          David Boychuck added a comment -

          The TermAllGroupsCollector implementation can be used to compute the counts. It only needs to be integrated in the Solr facet code.

          Do you mean the TermGroupFacetCollector ?

          Show
          David Boychuck added a comment - The TermAllGroupsCollector implementation can be used to compute the counts. It only needs to be integrated in the Solr facet code. Do you mean the TermGroupFacetCollector ?
          Hide
          Martijn van Groningen added a comment -

          Do you use IRC or any other type of IM software to communicate? Might make development easier.

          I'm usually in #lucene and #lucene-dev on IRC

          Do you mean that this grouping is already functional on facet.query?

          The TermAllGroupsCollector implementation can be used to compute the counts. It only needs to be integrated in the Solr facet code.

          Show
          Martijn van Groningen added a comment - Do you use IRC or any other type of IM software to communicate? Might make development easier. I'm usually in #lucene and #lucene-dev on IRC Do you mean that this grouping is already functional on facet.query? The TermAllGroupsCollector implementation can be used to compute the counts. It only needs to be integrated in the Solr facet code.
          Hide
          David Boychuck added a comment -

          There are already collectors in the grouping module that compute a grouped count for a query.

          Do you mean that this grouping is already functional on facet.query ?

          Show
          David Boychuck added a comment - There are already collectors in the grouping module that compute a grouped count for a query. Do you mean that this grouping is already functional on facet.query ?
          Hide
          David Boychuck added a comment -

          Yes I am using group.truncate since facet.query doesn't have grouping functionality yet.

          We could start on grouped facet queries since this really is my main priority.

          Do you use IRC or any other type of IM software to communicate? Might make development easier.

          Show
          David Boychuck added a comment - Yes I am using group.truncate since facet.query doesn't have grouping functionality yet. We could start on grouped facet queries since this really is my main priority. Do you use IRC or any other type of IM software to communicate? Might make development easier.
          Hide
          Martijn van Groningen added a comment -

          I just realized that computing stats on an ungrouped docset still wouldn't work since I still need to do query facets on price ranges.

          I don't follow this completely. If you use query or range facets this just should work, right? Or are you using group.facet or group.truncate in the same request?

          Have you started on GroupedFacets?

          Nope. I created group facet collector in the Lucene grouping module which is used by Solr in the SimpleFacets.

          If I remember correctly both query facets and range facets in Solr are queries being executed on a top level searcher. For each queries a count is computed (based on the facet and main query result) and put in the response. For range facets multiple queries are executed based on the start, end and gap. I think grouped variant just needs to compute a grouped count for each query being executed. There are already collectors in the grouping module that compute a grouped count for a query.

          The only thing I'm worried about is caching. For each query or range facet a docset is computed and this stored in the filter cache and possible used for future requests. This docset is intersected with the docset matching with the main query, which result in the count being used in the response. We would need to do something similar.

          Show
          Martijn van Groningen added a comment - I just realized that computing stats on an ungrouped docset still wouldn't work since I still need to do query facets on price ranges. I don't follow this completely. If you use query or range facets this just should work, right? Or are you using group.facet or group.truncate in the same request? Have you started on GroupedFacets? Nope. I created group facet collector in the Lucene grouping module which is used by Solr in the SimpleFacets. If I remember correctly both query facets and range facets in Solr are queries being executed on a top level searcher. For each queries a count is computed (based on the facet and main query result) and put in the response. For range facets multiple queries are executed based on the start, end and gap. I think grouped variant just needs to compute a grouped count for each query being executed. There are already collectors in the grouping module that compute a grouped count for a query. The only thing I'm worried about is caching. For each query or range facet a docset is computed and this stored in the filter cache and possible used for future requests. This docset is intersected with the docset matching with the main query, which result in the count being used in the response. We would need to do something similar.

            People

            • Assignee:
              Martijn van Groningen
              Reporter:
              David Boychuck
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 504h
                504h
                Remaining:
                Remaining Estimate - 504h
                504h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development