Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.4, 4.0-ALPHA
    • Component/s: search
    • Labels:
      None

      Description

      Child issue of SOLR-236. This issue is dedicated to field collapsing in general and all its code (CollapseComponent, DocumentCollapsers and CollapseCollectors). The main goal is the finalize the request parameters and response format.

      1. field-collapsing.patch
        205 kB
        Martijn van Groningen
      2. SOLR-236.patch
        47 kB
        Shalin Shekhar Mangar
      3. SOLR-1682.patch
        51 kB
        Martijn van Groningen
      4. SOLR-1682.patch
        53 kB
        Shalin Shekhar Mangar
      5. SOLR-1682_prototype.patch
        19 kB
        Yonik Seeley
      6. SOLR-1682_prototype.patch
        19 kB
        Martijn van Groningen
      7. SOLR-1682_prototype.patch
        41 kB
        Yonik Seeley
      8. SOLR-1682.patch
        51 kB
        Yonik Seeley
      9. SOLR-1682.patch
        63 kB
        Martijn van Groningen
      10. SOLR-1682.patch
        80 kB
        Yonik Seeley
      11. SOLR-1682.patch
        80 kB
        Yonik Seeley
      12. SOLR-1682.patch
        69 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          bronco added a comment -

          It doesn't work even if I use set mincount=1 and limit=-1. I always get the wrong numFound result. I try this now for 2 days to find a suitable way to make it work.

          I have on my search page 2 results but the pager thinks he has to draw 3 steps because the numFound result says 22. This is really not good. Are there any working solutions outside?

          Show
          bronco added a comment - It doesn't work even if I use set mincount=1 and limit=-1. I always get the wrong numFound result. I try this now for 2 days to find a suitable way to make it work. I have on my search page 2 results but the pager thinks he has to draw 3 steps because the numFound result says 22. This is really not good. Are there any working solutions outside?
          Martijn van Groningen made changes -
          Status Open [ 1 ] Closed [ 6 ]
          Resolution Fixed [ 1 ]
          Robert Muir made changes -
          Fix Version/s 3.4 [ 12316683 ]
          Fix Version/s 4.0 [ 12314992 ]
          Fix Version/s 3.3 [ 12316471 ]
          Robert Muir made changes -
          Fix Version/s 3.3 [ 12316471 ]
          Fix Version/s 3.2 [ 12316172 ]
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hoss Man made changes -
          Fix Version/s 3.2 [ 12316172 ]
          Fix Version/s Next [ 12315093 ]
          Hide
          Otis Gospodnetic added a comment -

          Are there known trunk patches that make it possible to use field grouping/collapsing in distributed search based on what's in this SOLR-1682?

          Show
          Otis Gospodnetic added a comment - Are there known trunk patches that make it possible to use field grouping/collapsing in distributed search based on what's in this SOLR-1682 ?
          Hide
          Jan Høydahl added a comment -

          What's the state of backporting to 3.x?

          Show
          Jan Høydahl added a comment - What's the state of backporting to 3.x?
          Hide
          Bill Bell added a comment -

          Jasper van Veghel - See my patch https://issues.apache.org/jira/browse/SOLR-2242

          This will work if you use group.field for the same field you do the facet.field. Just make sure you set mincount=1 and limit=-1.

          Thanks.

          Show
          Bill Bell added a comment - Jasper van Veghel - See my patch https://issues.apache.org/jira/browse/SOLR-2242 This will work if you use group.field for the same field you do the facet.field. Just make sure you set mincount=1 and limit=-1. Thanks.
          Hide
          Jasper van Veghel added a comment -

          Is there any way to retrieve the request the total number of groups, in despite of the (previously noted) performance penalty involved? We'd like to be able to provide complete paging and total counts, but we can't when the total number of documents is say, 200, but the last group is say, 190. That would mean that the last page in a 10-document pager would be empty.

          Show
          Jasper van Veghel added a comment - Is there any way to retrieve the request the total number of groups, in despite of the (previously noted) performance penalty involved? We'd like to be able to provide complete paging and total counts, but we can't when the total number of documents is say, 200, but the last group is say, 190. That would mean that the last page in a 10-document pager would be empty.
          Hide
          Bill Bell added a comment -

          Eric,

          Get the latest trunk. This is fixed.

          Show
          Bill Bell added a comment - Eric, Get the latest trunk. This is fixed.
          Bill Bell made changes -
          Link This issue blocks SOLR-2242 [ SOLR-2242 ]
          Bill Bell made changes -
          Link This issue blocks SOLR-2246 [ SOLR-2246 ]
          Hide
          Eric Caron added a comment -

          Is anyone else having the issue that when supplying an offset/start, that the subset isn't being generated? For example, if I do start=0&rows=10, I get entries 1 through 10, but if I do start=10&rows=10, I get entries 1 through 20 (as of commit #1005652 in trunk).

          Show
          Eric Caron added a comment - Is anyone else having the issue that when supplying an offset/start, that the subset isn't being generated? For example, if I do start=0&rows=10, I get entries 1 through 10, but if I do start=10&rows=10, I get entries 1 through 20 (as of commit #1005652 in trunk).
          Hide
          Jasper van Veghel added a comment -

          Great stuff guys. We're using Field Collapsing to fold in URLs of documents in our index which have the same URL but multiple individual content parts. We can't merge these earlier in the indexing proces as each individual page content-part has different access rights. As such, not every piece of content is accessible to everyone, but different users with different access rights might still end up on the same URL and might end up finding multiple results for the same page URL.

          As for faceting and highlighting - I've managed to merge in the faceting patch from SOLR-2098 and have taken the highlighting changes from Yonik's github commit (r997870). This seems to be working flawlessly, and all we're rooting for now is for some love for distributed collapsing so we use it across multiple shards. So far we haven't run into any issues with the current (august 20th) patch. Keep up the good work!

          Show
          Jasper van Veghel added a comment - Great stuff guys. We're using Field Collapsing to fold in URLs of documents in our index which have the same URL but multiple individual content parts. We can't merge these earlier in the indexing proces as each individual page content-part has different access rights. As such, not every piece of content is accessible to everyone, but different users with different access rights might still end up on the same URL and might end up finding multiple results for the same page URL. As for faceting and highlighting - I've managed to merge in the faceting patch from SOLR-2098 and have taken the highlighting changes from Yonik's github commit (r997870). This seems to be working flawlessly, and all we're rooting for now is for some love for distributed collapsing so we use it across multiple shards. So far we haven't run into any issues with the current (august 20th) patch. Keep up the good work!
          Hide
          Varun Gupta added a comment -

          Is there any workaround to use Highlight and Facet components along with grouping?

          Show
          Varun Gupta added a comment - Is there any workaround to use Highlight and Facet components along with grouping?
          Hide
          Cuong Hoang added a comment -

          I tried out this patch in trunk but it does seem to work with other components include Facet and Highlighter. These two components require docList in them ResponseBuilder instance passed to them while code to do grouping doesn't actually set docList and docSet like the normal QueryComponent. Is this intentional or am I missing something?

          Show
          Cuong Hoang added a comment - I tried out this patch in trunk but it does seem to work with other components include Facet and Highlighter. These two components require docList in them ResponseBuilder instance passed to them while code to do grouping doesn't actually set docList and docSet like the normal QueryComponent. Is this intentional or am I missing something?
          Hide
          Lance Norskog added a comment -

          If I want the Google search-style "there are more results from this site" UI, I don't care about counts. I can just pull the field from the first search N results and hunt for repeats.

          Does this patch do that efficiently? Would a custom UpdateHandler or RequestHandler be the right way to do that?

          Show
          Lance Norskog added a comment - If I want the Google search-style "there are more results from this site" UI, I don't care about counts. I can just pull the field from the first search N results and hunt for repeats. Does this patch do that efficiently? Would a custom UpdateHandler or RequestHandler be the right way to do that?
          Hide
          Yonik Seeley added a comment -

          I'd rather not tackle backporting immediately - it's going to be under a lot of flux.

          Show
          Yonik Seeley added a comment - I'd rather not tackle backporting immediately - it's going to be under a lot of flux.
          Hide
          Grant Ingersoll added a comment -

          Can we back port to 3.x? How hard?

          Show
          Grant Ingersoll added a comment - Can we back port to 3.x? How hard?
          Hide
          Yonik Seeley added a comment -

          No, with the current algorithm we avoid keeping all of the groups in memory at once, so we never know exactly how many unique ones we hit. But an option to retrieve it is a good idea - we probably just don't want to do it by default.

          Show
          Yonik Seeley added a comment - No, with the current algorithm we avoid keeping all of the groups in memory at once, so we never know exactly how many unique ones we hit. But an option to retrieve it is a good idea - we probably just don't want to do it by default.
          Hide
          James Dyer added a comment -

          In addition to "matches" for total # docs, do we have a way to get the total # of groups?

          Show
          James Dyer added a comment - In addition to "matches" for total # docs, do we have a way to get the total # of groups?
          Yonik Seeley made changes -
          Attachment SOLR-1682.patch [ 12452646 ]
          Hide
          Yonik Seeley added a comment -

          OK, here's another update: other little fixes + tests, and "matches" is added at the top level to give a complete count of the number of docs that matched the query.

          Show
          Yonik Seeley added a comment - OK, here's another update: other little fixes + tests, and "matches" is added at the top level to give a complete count of the number of docs that matched the query.
          Yonik Seeley made changes -
          Attachment SOLR-1682.patch [ 12452602 ]
          Hide
          Yonik Seeley added a comment -

          Here's a patch that adds support for retrieving scores also.
          I think we're getting close to something committable!

          Show
          Yonik Seeley added a comment - Here's a patch that adds support for retrieving scores also. I think we're getting close to something committable!
          Yonik Seeley made changes -
          Attachment SOLR-1682.patch [ 12452583 ]
          Hide
          Yonik Seeley added a comment -

          Here's an updated patch that merges in Martijn change, and implements some more tests (using the new JSON test method). I also went with the name "doclist" for now.

          Show
          Yonik Seeley added a comment - Here's an updated patch that merges in Martijn change, and implements some more tests (using the new JSON test method). I also went with the name "doclist" for now.
          Hide
          Martijn van Groningen added a comment -

          2) the popularity of a group is the max among all docs in that group

          If done as different sort in the second phase collector. This means then that the groups themselves are sorted on popularity desc (most popular group end up as first result), but the documents inside the group on price asc. This can be confusing since the document responsible for getting the group in the result set (top 10) might not be put in inside the group during the second phase. I'm not sure if end user expect / want this.

          had planned on group.limit to be the number of groups returned - so I just need to get used to the new way of thinking about these

          Can't we use rows parameter for that?

          It looks fine in XML, but in JSON the representation of a doclist as "docs" it

          Maybe groupDocs fits as good description

          Show
          Martijn van Groningen added a comment - 2) the popularity of a group is the max among all docs in that group If done as different sort in the second phase collector. This means then that the groups themselves are sorted on popularity desc (most popular group end up as first result), but the documents inside the group on price asc. This can be confusing since the document responsible for getting the group in the result set (top 10) might not be put in inside the group during the second phase. I'm not sure if end user expect / want this. had planned on group.limit to be the number of groups returned - so I just need to get used to the new way of thinking about these Can't we use rows parameter for that? It looks fine in XML, but in JSON the representation of a doclist as "docs" it Maybe groupDocs fits as good description
          Hide
          Yonik Seeley added a comment -

          A group currently looks like this:

                  {
                    "groupValue":1,
                    "matches":2,
                    "docs":{"numFound":2,"start":0,"docs":[
                        {
                          "id":"F8V7067-APL-KIT",
                          "price":19.95},
                        {
                          "id":"IW-02",
                          "price":11.5}]
                    }},
          

          It looks fine in XML, but in JSON the representation of a doclist as "docs" itself as part of it (so now we have "docs" nested directly in "docs"). Should we change the name of that outer "docs" to something else? "response", "matches", "topdocs","doclist", or just live with it?

          Show
          Yonik Seeley added a comment - A group currently looks like this: { "groupValue" :1, "matches" :2, "docs" :{ "numFound" :2, "start" :0, "docs" :[ { "id" : "F8V7067-APL-KIT" , "price" :19.95}, { "id" : "IW-02" , "price" :11.5}] }}, It looks fine in XML, but in JSON the representation of a doclist as "docs" itself as part of it (so now we have "docs" nested directly in "docs"). Should we change the name of that outer "docs" to something else? "response", "matches", "topdocs","doclist", or just live with it?
          Hide
          Yonik Seeley added a comment -

          back to the naming of docsPerGroup: I guess if we stick with group.sort as the sort within a single group, then use of group.limit is perfectly consistent with that (I had planned on group.limit to be the number of groups returned - so I just need to get used to the new way of thinking about these)

          Show
          Yonik Seeley added a comment - back to the naming of docsPerGroup: I guess if we stick with group.sort as the sort within a single group, then use of group.limit is perfectly consistent with that (I had planned on group.limit to be the number of groups returned - so I just need to get used to the new way of thinking about these)
          Hide
          Yonik Seeley added a comment -

          Groups are sorted by the first and most relevant document in a group.

          Going back to a specific example... assuming we are sorting by price asc within a single group, but sorting groups by popularity desc (however that's defined).
          So we have 2 sane choices about what the popularity of a group is:
          1) the popularity of a group is the lowest price doc (i.e. the first in the list for that group). - this is what you did
          2) the popularity of a group is the max among all docs in that group

          I wonder which will be more useful to people? Do we need both?
          #2 can be trivially implemented with the existing collectors (just use different sorts at the different stages)

          Show
          Yonik Seeley added a comment - Groups are sorted by the first and most relevant document in a group. Going back to a specific example... assuming we are sorting by price asc within a single group, but sorting groups by popularity desc (however that's defined). So we have 2 sane choices about what the popularity of a group is: 1) the popularity of a group is the lowest price doc (i.e. the first in the list for that group). - this is what you did 2) the popularity of a group is the max among all docs in that group I wonder which will be more useful to people? Do we need both? #2 can be trivially implemented with the existing collectors (just use different sorts at the different stages)
          Hide
          Martijn van Groningen added a comment -

          OK, so if I'm reading the patch correctly, it looks like the new group.sort you added decouples the sort of the groups from the sort of the documents within each group? (and group.sort specifies the sort of the docs within each group?)

          Yes it decouples the sort of the groups from the sort of the documents within the group and is specified via the group.sort parameter.

          What's the intended semantics of how to sort the groups?

          What happens is that documents inside a group are sorted by the group.sort parameter and the groups are sorted by the sort parameter. The groups are sorted with the most relevant (first) document of a group. So in your example the first group in the result is not the most popular, but the most expensive popular group (the first document in the first group will be).

          Are groups sorted by the popularity of the first doc....

          Groups are sorted by the first and most relevant document in a group. In your example the document in a group with the highest price.

          Show
          Martijn van Groningen added a comment - OK, so if I'm reading the patch correctly, it looks like the new group.sort you added decouples the sort of the groups from the sort of the documents within each group? (and group.sort specifies the sort of the docs within each group?) Yes it decouples the sort of the groups from the sort of the documents within the group and is specified via the group.sort parameter. What's the intended semantics of how to sort the groups? What happens is that documents inside a group are sorted by the group.sort parameter and the groups are sorted by the sort parameter. The groups are sorted with the most relevant (first) document of a group. So in your example the first group in the result is not the most popular, but the most expensive popular group (the first document in the first group will be). Are groups sorted by the popularity of the first doc.... Groups are sorted by the first and most relevant document in a group. In your example the document in a group with the highest price.
          Hide
          Yonik Seeley added a comment -

          OK, so if I'm reading the patch correctly, it looks like the new group.sort you added decouples the sort of the groups from the sort of the documents within each group? (and group.sort specifies the sort of the docs within each group?)

          What's the intended semantics of how to sort the groups? For example: if I'm sorting the documents within each group by price desc, and I'm sorting the groups by popularity desc... what if the top docs kept in group A (those with the highest price) are not the docs with the highest popularity? Are groups sorted by the popularity of the first doc (i.e. the one with the highest price), or are they sorted by the highest popularity doc with group value A (even if it's not in the top N in that group)?

          Show
          Yonik Seeley added a comment - OK, so if I'm reading the patch correctly, it looks like the new group.sort you added decouples the sort of the groups from the sort of the documents within each group? (and group.sort specifies the sort of the docs within each group?) What's the intended semantics of how to sort the groups? For example: if I'm sorting the documents within each group by price desc, and I'm sorting the groups by popularity desc... what if the top docs kept in group A (those with the highest price) are not the docs with the highest popularity? Are groups sorted by the popularity of the first doc (i.e. the one with the highest price), or are they sorted by the highest popularity doc with group value A (even if it's not in the top N in that group)?
          Hide
          Martijn van Groningen added a comment -

          True always a hassle!

          I'm already using group.limit as a limit on the number of groups.

          That one seems unused in the current patch. The group.docsPerGroup is used for that purpose now, but I guess that will change.

          collapse.threshold is the name that the original field collapsing patches

          Yes, collapse.threshold only makes sense when you're collapsing.

          Show
          Martijn van Groningen added a comment - True always a hassle! I'm already using group.limit as a limit on the number of groups. That one seems unused in the current patch. The group.docsPerGroup is used for that purpose now, but I guess that will change. collapse.threshold is the name that the original field collapsing patches Yes, collapse.threshold only makes sense when you're collapsing.
          Hide
          Yonik Seeley added a comment -

          Thanks Martijn, I'll try and merge your patch in with what I currently have (diffing patches... blech

          I like group.limit and group.offset.

          I'm already using group.limit as a limit on the number of groups.
          collapse.threshold is the name that the original field collapsing patches used - but that name doesn't make as much sense when thinking about grouping.

          Show
          Yonik Seeley added a comment - Thanks Martijn, I'll try and merge your patch in with what I currently have (diffing patches... blech I like group.limit and group.offset. I'm already using group.limit as a limit on the number of groups. collapse.threshold is the name that the original field collapsing patches used - but that name doesn't make as much sense when thinking about grouping.
          Martijn van Groningen made changes -
          Attachment SOLR-1682.patch [ 12452210 ]
          Hide
          Martijn van Groningen added a comment -

          I've attached a new patch that allows the user to specify:
          group.sort=<field> <order> - This activates the TopGroupSortCollector 1st phase collector (which extends the TopGroupCollector). The Phase2GroupCollector stayed the same. Only the sort argument is the group sort.

          Also I've added the first tests in TestGroupingSearch mainly for the group sorting.
          I also created GroupSortCommand that holds the group sort which is a subclass of GroupCommandFunc. I'm not sure but maybe this belongs in this GroupCommandFunc, b/c it is a common functionality.

          The "docsPerGroup" name seems a little more verbose than normal - anyone have shorter ideas? Perhaps some kind of limit/offset params for the docs in a group.

          I like group.limit and group.offset. This allows the user to paginate the group docs. On the other side doing this will properly be memory ineffecient just like deep paging only in this case worse b/c it done on all groups.

          Show
          Martijn van Groningen added a comment - I've attached a new patch that allows the user to specify: group.sort=<field> <order> - This activates the TopGroupSortCollector 1st phase collector (which extends the TopGroupCollector). The Phase2GroupCollector stayed the same. Only the sort argument is the group sort. Also I've added the first tests in TestGroupingSearch mainly for the group sorting. I also created GroupSortCommand that holds the group sort which is a subclass of GroupCommandFunc. I'm not sure but maybe this belongs in this GroupCommandFunc, b/c it is a common functionality. The "docsPerGroup" name seems a little more verbose than normal - anyone have shorter ideas? Perhaps some kind of limit/offset params for the docs in a group. I like group.limit and group.offset. This allows the user to paginate the group docs. On the other side doing this will properly be memory ineffecient just like deep paging only in this case worse b/c it done on all groups.
          Hide
          Martijn van Groningen added a comment -

          Cool stuff Yonik! I noticed in the SolrIndexSearcher#groupBy method that you were instantiating a Lucene filter from the filer docset but you are not using it as argument to the search method, so if some specifies a fq it would not work. I'll dive deeper into the patch in the next few days.

          Show
          Martijn van Groningen added a comment - Cool stuff Yonik! I noticed in the SolrIndexSearcher#groupBy method that you were instantiating a Lucene filter from the filer docset but you are not using it as argument to the search method, so if some specifies a fq it would not work. I'll dive deeper into the patch in the next few days.
          Yonik Seeley made changes -
          Attachment SOLR-1682.patch [ 12451961 ]
          Hide
          Yonik Seeley added a comment -

          I'm getting back to grouping/collapsing... here's a development patch. I cleaned up a bunch of stuff, took a shot at coming up with a good HTTP API and response format, and enabled multiple field groupings in the same request.

          HTTP params:
          group=true/false (like faceting, turn on/off grouping)
          group.query=<the query> - this is analogous to facet.query, but currently not implemented
          group.field=<the field> - group by a field
          group.func=<the function> - group by a function
          group.limit - the top number of groups to report back (default is equal to the normal "limit" param, or 1 if unspecified)
          group.docsPerGroup - the top number of documents per group to return

          We'll need to be able to specify some of these per group. That hasn't been implemented, but utilizing local params seems natural (since the alternate f.<fieldname>.param method only works well with fields).

          The "docsPerGroup" name seems a little more verbose than normal - anyone have shorter ideas? Perhaps some kind of limit/offset params for the docs in a group.

          Here's an example of the current API:
          http://localhost:8983/solr/select?wt=json&indent=true&q=*:*&fl=id&rows=3&group=true&group.docsPerGroup=2&group.field=popularity&group.func=add%28popularity,popularity%29

          {
            "responseHeader":{
              "status":0,
              "QTime":2,
              "params":{
                "fl":"id",
                "indent":"true",
                "q":"*:*",
                "group.field":"popularity",
                "group.func":"add(popularity,popularity)",
                "group.docsPerGroup":"2",
                "group":"true",
                "wt":"json",
                "rows":"3"}},
            "grouped":{
              "popularity":{
                "groups":[{
                    "groupValue":6,
                    "matches":5,
                    "docs":{"numFound":5,"start":0,"docs":[
                        {
                          "id":"SP2514N"},
                        {
                          "id":"6H500F0"}]
                    }},
                  {
                    "groupValue":1,
                    "matches":2,
                    "docs":{"numFound":2,"start":0,"docs":[
                        {
                          "id":"F8V7067-APL-KIT"},
                        {
                          "id":"IW-02"}]
                    }},
                  {
                    "groupValue":10,
                    "matches":2,
                    "docs":{"numFound":2,"start":0,"docs":[
                        {
                          "id":"MA147LL/A"},
                        {
                          "id":"SOLR1000"}]
                    }}]},
              "add(popularity,popularity)":{
                "groups":[{
                    "groupValue":12.0,
                    "matches":5,
                    "docs":{"numFound":5,"start":0,"docs":[
                        {
                          "id":"SP2514N"},
                        {
                          "id":"6H500F0"}]
                    }},
                  {
                    "groupValue":2.0,
                    "matches":2,
                    "docs":{"numFound":2,"start":0,"docs":[
                        {
                          "id":"F8V7067-APL-KIT"},
                        {
                          "id":"IW-02"}]
                    }},
                  {
                    "groupValue":20.0,
                    "matches":2,
                    "docs":{"numFound":2,"start":0,"docs":[
                        {
                          "id":"MA147LL/A"},
                        {
                          "id":"SOLR1000"}]
                    }}]}}}
          
          Show
          Yonik Seeley added a comment - I'm getting back to grouping/collapsing... here's a development patch. I cleaned up a bunch of stuff, took a shot at coming up with a good HTTP API and response format, and enabled multiple field groupings in the same request. HTTP params: group=true/false (like faceting, turn on/off grouping) group.query=<the query> - this is analogous to facet.query, but currently not implemented group.field=<the field> - group by a field group.func=<the function> - group by a function group.limit - the top number of groups to report back (default is equal to the normal "limit" param, or 1 if unspecified) group.docsPerGroup - the top number of documents per group to return We'll need to be able to specify some of these per group. That hasn't been implemented, but utilizing local params seems natural (since the alternate f.<fieldname>.param method only works well with fields). The "docsPerGroup" name seems a little more verbose than normal - anyone have shorter ideas? Perhaps some kind of limit/offset params for the docs in a group. Here's an example of the current API: http://localhost:8983/solr/select?wt=json&indent=true&q=*:*&fl=id&rows=3&group=true&group.docsPerGroup=2&group.field=popularity&group.func=add%28popularity,popularity%29 { "responseHeader" :{ "status" :0, "QTime" :2, "params" :{ "fl" : "id" , "indent" : " true " , "q" : "*:*" , "group.field" : "popularity" , "group.func" : "add(popularity,popularity)" , "group.docsPerGroup" : "2" , "group" : " true " , "wt" : "json" , "rows" : "3" }}, "grouped" :{ "popularity" :{ "groups" :[{ "groupValue" :6, "matches" :5, "docs" :{ "numFound" :5, "start" :0, "docs" :[ { "id" : "SP2514N" }, { "id" : "6H500F0" }] }}, { "groupValue" :1, "matches" :2, "docs" :{ "numFound" :2, "start" :0, "docs" :[ { "id" : "F8V7067-APL-KIT" }, { "id" : "IW-02" }] }}, { "groupValue" :10, "matches" :2, "docs" :{ "numFound" :2, "start" :0, "docs" :[ { "id" : "MA147LL/A" }, { "id" : "SOLR1000" }] }}]}, "add(popularity,popularity)" :{ "groups" :[{ "groupValue" :12.0, "matches" :5, "docs" :{ "numFound" :5, "start" :0, "docs" :[ { "id" : "SP2514N" }, { "id" : "6H500F0" }] }}, { "groupValue" :2.0, "matches" :2, "docs" :{ "numFound" :2, "start" :0, "docs" :[ { "id" : "F8V7067-APL-KIT" }, { "id" : "IW-02" }] }}, { "groupValue" :20.0, "matches" :2, "docs" :{ "numFound" :2, "start" :0, "docs" :[ { "id" : "MA147LL/A" }, { "id" : "SOLR1000" }] }}]}}}
          Yonik Seeley made changes -
          Attachment SOLR-1682_prototype.patch [ 12449183 ]
          Hide
          Yonik Seeley added a comment -

          Here's an updated patch. I started down the path at a combined hashmap/treemap... but reverted since it's pure premature optimization at this point.

          I did change the way that values are obtained from value sources (via a ValueFiller class). This avoids every single DocValues in a complex function from creating a MutableValue for no reason, and avoids a cast that would be needed if the user passed in a value to be filled each time.

          The second stage (for accurate counts, and for collapse counts greater than one) is now implemented.

          Example:
          http://localhost:8983/solr/select?q=*:*&groupby=popularity&docsPerGroup=3&fl=id,popularity

          Example of grouping by an arbitrary function:
          http://localhost:8983/solr/select?q=*:*&groupby=add%28popularity,popularity%29&docsPerGroup=3&fl=id,popularity

          Caveats:

          • much of this is still test code... the parameter names will change, as will the upper level interface code and response format. the focus so far has just been on the collectors and value sources.
          • Both the group sort and the documents within a group are currently governed by the "sort" param. This won't always be the case.
          • This is only a general purpose algorithm that should work with a minimum of memory usage - there will be many different algorithms that offer better performance in specific scenarios in the future.
          Show
          Yonik Seeley added a comment - Here's an updated patch. I started down the path at a combined hashmap/treemap... but reverted since it's pure premature optimization at this point. I did change the way that values are obtained from value sources (via a ValueFiller class). This avoids every single DocValues in a complex function from creating a MutableValue for no reason, and avoids a cast that would be needed if the user passed in a value to be filled each time. The second stage (for accurate counts, and for collapse counts greater than one) is now implemented. Example: http://localhost:8983/solr/select?q=*:*&groupby=popularity&docsPerGroup=3&fl=id,popularity Example of grouping by an arbitrary function: http://localhost:8983/solr/select?q=*:*&groupby=add%28popularity,popularity%29&docsPerGroup=3&fl=id,popularity Caveats: much of this is still test code... the parameter names will change, as will the upper level interface code and response format. the focus so far has just been on the collectors and value sources. Both the group sort and the documents within a group are currently governed by the "sort" param. This won't always be the case. This is only a general purpose algorithm that should work with a minimum of memory usage - there will be many different algorithms that offer better performance in specific scenarios in the future.
          Hide
          Yonik Seeley added a comment -

          ... there is a notion of CollapseCollector ...

          Seems like a useful concept, but perhaps for all docs in a group and not just the ones that aren't returned? Or if it is useful to distinguish, we could provide support for both (via different collectors, or different methods on the collector).

          Show
          Yonik Seeley added a comment - ... there is a notion of CollapseCollector ... Seems like a useful concept, but perhaps for all docs in a group and not just the ones that aren't returned? Or if it is useful to distinguish, we could provide support for both (via different collectors, or different methods on the collector).
          Hide
          Yonik Seeley added a comment -

          I've updated my phase1 collapsing to include an arbitrary sort, and fixed some bugs along the way.

          I've been trying to understand the other patches here... but there are some interesting mysteries.

          There are multiple occurrences of comparator code like this:

                  if ((c < 0 && reverseMul[i] > 0) || (c > 0 && reverseMul[i] < 0)) {
          

          but reverseMul has already been folded into "c", so a simple c<0 seems correct?

          Also, looking at the second stage collapsing (CollapsedDocCollector) I think I see a number of issues:

          • the counts collected in the first phase may not be correct, but are used in the second phase
          • setBottom is used, but there really needs to be a bottom per group (actually, compareBottom is never used anyway)
          • the field comparator is used incorrectly... compare(doc,slot) is called, but that's actualy (slot1,slot2) so the results will be wrong (or throw an exception).

          I'm surprised that any included tests passed given these apparent problems (unless I'm reading the code incorrectly). I think we'll need some very good random tests for this functionality to be sure we're hitting all of the corner cases.

          Anyway, I'm starting to implement the second phase collector with essentially a priority queue per group.

          Show
          Yonik Seeley added a comment - I've updated my phase1 collapsing to include an arbitrary sort, and fixed some bugs along the way. I've been trying to understand the other patches here... but there are some interesting mysteries. There are multiple occurrences of comparator code like this: if ((c < 0 && reverseMul[i] > 0) || (c > 0 && reverseMul[i] < 0)) { but reverseMul has already been folded into "c", so a simple c<0 seems correct? Also, looking at the second stage collapsing (CollapsedDocCollector) I think I see a number of issues: the counts collected in the first phase may not be correct, but are used in the second phase setBottom is used, but there really needs to be a bottom per group (actually, compareBottom is never used anyway) the field comparator is used incorrectly... compare(doc,slot) is called, but that's actualy (slot1,slot2) so the results will be wrong (or throw an exception). I'm surprised that any included tests passed given these apparent problems (unless I'm reading the code incorrectly). I think we'll need some very good random tests for this functionality to be sure we're hitting all of the corner cases. Anyway, I'm starting to implement the second phase collector with essentially a priority queue per group.
          Hide
          Martijn van Groningen added a comment -

          I guess it depends... if this is the first phase only (just to find the top groups) then we don't really need the counts. If the collapse count is one... then we need to either fix the counts another way, and potentially provide an option to not return the counts.

          If no counts are required then it would be optimal and fast. In the cases when the counts or any other aggregate statistics are necessary we would need to keep all the collapse groups in order to be accurate. Or give an option that the aggregate values are 'estimated', but all these variants can be different implementations. I think we should get at least one implementation ready (preferably the fast one) and the architecture for the different algorithms.

          In the patches in SOLR-236 there is a notion of CollapseCollector, this accepts document ids that are collapsed / grouped and are not returned to the regular result. Each implementation can do anything with this document id. For example to compute count, max, average or to keep to later return is collapsed document in the collapse response. How do you see that such a concept could be integrated into this patch? Or do you think its better to keep this functionality in the grouping implementations.

          There are other use cases where collapsed docs are more of an exception and the traditional single-doc-list would be better.

          That is true, there are a lot of options to this to client in the response.

          Show
          Martijn van Groningen added a comment - I guess it depends... if this is the first phase only (just to find the top groups) then we don't really need the counts. If the collapse count is one... then we need to either fix the counts another way, and potentially provide an option to not return the counts. If no counts are required then it would be optimal and fast. In the cases when the counts or any other aggregate statistics are necessary we would need to keep all the collapse groups in order to be accurate. Or give an option that the aggregate values are 'estimated', but all these variants can be different implementations. I think we should get at least one implementation ready (preferably the fast one) and the architecture for the different algorithms. In the patches in SOLR-236 there is a notion of CollapseCollector, this accepts document ids that are collapsed / grouped and are not returned to the regular result. Each implementation can do anything with this document id. For example to compute count, max, average or to keep to later return is collapsed document in the collapse response. How do you see that such a concept could be integrated into this patch? Or do you think its better to keep this functionality in the grouping implementations. There are other use cases where collapsed docs are more of an exception and the traditional single-doc-list would be better. That is true, there are a lot of options to this to client in the response.
          Hide
          Yonik Seeley added a comment -

          > these aren't valid if the group is every discarded then re-added. keep track if there have been discards?

          I think this means that we have to keep all groups in memory.

          I guess it depends... if this is the first phase only (just to find the top groups) then we don't really need the counts. If the collapse count is one... then we need to either fix the counts another way, and potentially provide an option to not return the counts.

          Also we need to find a way of adding the collapse information to the response in a nice manner. I assume we still want the use the response format Shalin suggested? It does differ from the response the patch is currently generating.

          My prototype was just quick'n'dirty just to get the info out to the response writer to see if it worked.
          But I'm not yet sure what the response format should be (and I've changed my mind before based on what types of usecases I'm thinking about). For usecases like bestbuy (do a search for something like DVD to see their field collapsing), the grouping is pretty explicit and it seems to make sense to return multiple lists of documents. One may also want to have a different sort in grouped documents, and for that it makes little sense to try and combine them all into a single list. It's also the case that people may often want a fixed number of groups, rather than a fixed number of documents. Also, it's possible that people may want to group on more than one field (like people facet on more than one field).

          There are other use cases where collapsed docs are more of an exception and the traditional single-doc-list would be better. But instead of trying to implement all these variants at once, I'm thinking of starting with a more generic groupedResults separate from normal results. We can add options later for an alternate flattened representation.

          Show
          Yonik Seeley added a comment - > these aren't valid if the group is every discarded then re-added. keep track if there have been discards? I think this means that we have to keep all groups in memory. I guess it depends... if this is the first phase only (just to find the top groups) then we don't really need the counts. If the collapse count is one... then we need to either fix the counts another way, and potentially provide an option to not return the counts. Also we need to find a way of adding the collapse information to the response in a nice manner. I assume we still want the use the response format Shalin suggested? It does differ from the response the patch is currently generating. My prototype was just quick'n'dirty just to get the info out to the response writer to see if it worked. But I'm not yet sure what the response format should be (and I've changed my mind before based on what types of usecases I'm thinking about). For usecases like bestbuy (do a search for something like DVD to see their field collapsing), the grouping is pretty explicit and it seems to make sense to return multiple lists of documents. One may also want to have a different sort in grouped documents, and for that it makes little sense to try and combine them all into a single list. It's also the case that people may often want a fixed number of groups, rather than a fixed number of documents. Also, it's possible that people may want to group on more than one field (like people facet on more than one field). There are other use cases where collapsed docs are more of an exception and the traditional single-doc-list would be better. But instead of trying to implement all these variants at once, I'm thinking of starting with a more generic groupedResults separate from normal results. We can add options later for an alternate flattened representation.
          Martijn van Groningen made changes -
          Attachment SOLR-1682_prototype.patch [ 12448359 ]
          Hide
          Martijn van Groningen added a comment -

          I checked out the patch and I had to make a two changes to make it working (the new patch attached contains the changes):

          • In the buildSet method i changed the comparator to this:
            Comparator<SearchGroup1> comparator = new Comparator<SearchGroup1>() {
                  public int compare(SearchGroup1 o1, SearchGroup1 o2) {
                    int comp =  fc.compare(o1.comparatorSlot, o2.comparatorSlot);
                    if (comp != 0) {
                      return comp;
                    }
                    if (o1.topDoc < o2.topDoc) {
                      return 1;
                    } else if (o1.topDoc > o2.topDoc) {
                      return -1;
                    }
            
                    return 0;
                  }
                };
            

            In cases where the sorting value was the same, collapse groups were lost (I was using the all query). This is the behavioir of TreeSet when the comparator returns 0.

          • When no sorting is specified a NPE occured. I temporarly fixed this by adding the following code in the groupBy method before the fieldComparator is initialized:
            if (cmd.sort == null) {
                  cmd.sort = new Sort(new SortField("score", SortField.SCORE, true));
                }
            

          I also saw the following todo in the code:

          these aren't valid if the group is every discarded then re-added. keep track if there have been discards?

          I think this means that we have to keep all groups in memory. The cost is increase of the memory footprint, but we then do get accurate collapse counts. This change can be put in a different implementation off course.

          Also we need to find a way of adding the collapse information to the response in a nice manner. I assume we still want the use the response format Shalin suggested? It does differ from the response the patch is currently generating.

          Show
          Martijn van Groningen added a comment - I checked out the patch and I had to make a two changes to make it working (the new patch attached contains the changes): In the buildSet method i changed the comparator to this: Comparator<SearchGroup1> comparator = new Comparator<SearchGroup1>() { public int compare(SearchGroup1 o1, SearchGroup1 o2) { int comp = fc.compare(o1.comparatorSlot, o2.comparatorSlot); if (comp != 0) { return comp; } if (o1.topDoc < o2.topDoc) { return 1; } else if (o1.topDoc > o2.topDoc) { return -1; } return 0; } }; In cases where the sorting value was the same, collapse groups were lost (I was using the all query). This is the behavioir of TreeSet when the comparator returns 0. When no sorting is specified a NPE occured. I temporarly fixed this by adding the following code in the groupBy method before the fieldComparator is initialized: if (cmd.sort == null ) { cmd.sort = new Sort( new SortField( "score" , SortField.SCORE, true )); } I also saw the following todo in the code: these aren't valid if the group is every discarded then re-added. keep track if there have been discards? I think this means that we have to keep all groups in memory. The cost is increase of the memory footprint, but we then do get accurate collapse counts. This change can be put in a different implementation off course. Also we need to find a way of adding the collapse information to the response in a nice manner. I assume we still want the use the response format Shalin suggested? It does differ from the response the patch is currently generating.
          Yonik Seeley made changes -
          Attachment SOLR-1682_prototype.patch [ 12448137 ]
          Hide
          Yonik Seeley added a comment -

          Here's an early prototype of mine, updated to trunk. It's designed to collapsing on an arbitrary function (value source). It only implements the first-phase level-1 collapsing, and assumes a single comparator for simplicity.

          The previous patches here (which I've just now started looking at) look pretty solid. It's surprising how similar the first level collapse is (how a single field comparator group used for everything, defer building the treeset, etc).

          I have some time to work on this now... I think I'll approach it by merging this patch with mine, working on getting a single collapse option working well enough to commit. The original SOLR-236 is simply too big and it feels like there are too many options to grapple with at the same time.

          Show
          Yonik Seeley added a comment - Here's an early prototype of mine, updated to trunk. It's designed to collapsing on an arbitrary function (value source). It only implements the first-phase level-1 collapsing, and assumes a single comparator for simplicity. The previous patches here (which I've just now started looking at) look pretty solid. It's surprising how similar the first level collapse is (how a single field comparator group used for everything, defer building the treeset, etc). I have some time to work on this now... I think I'll approach it by merging this patch with mine, working on getting a single collapse option working well enough to commit. The original SOLR-236 is simply too big and it feels like there are too many options to grapple with at the same time.
          Hoss Man made changes -
          Fix Version/s Next [ 12315093 ]
          Fix Version/s 1.5 [ 12313566 ]
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Lance Norskog added a comment -

          What's the status on this? Has this patch served its purpose in life? Should it grow into a committable patch?

          Show
          Lance Norskog added a comment - What's the status on this? Has this patch served its purpose in life? Should it grow into a committable patch?
          Otis Gospodnetic made changes -
          Link This issue relates to SOLR-236 [ SOLR-236 ]
          Shalin Shekhar Mangar made changes -
          Attachment SOLR-1682.patch [ 12430015 ]
          Hide
          Shalin Shekhar Mangar added a comment -

          Patch which fixes the inconsistent names for the meta fields.

          Show
          Shalin Shekhar Mangar added a comment - Patch which fixes the inconsistent names for the meta fields.
          Hide
          Shalin Shekhar Mangar added a comment -

          Shalin, I tried your patch out and I ran into a few problems with sorting and the collapse counts which turned out to be bugs.

          Thanks Martijn.

          Though I have a question about the response format. When collapse.threshold is > 1 and more than one documents is collapsed then the collapse.count is named group.size. The field group.numFound is then added as well. Why did you gave it a different name?

          Actually I intended to rename "collapse.value" to "group.value" and "collapse.count" to "group.numFound" but I forgot to do it in both the places.

          • group.numFound = the total number of documents belonging to this group (i.e. have the same group.value)
          • group.size = the number of documents in this result set belonging to the same group which is equal to min(group.numFound, collapse.threshold)

          So when collapse.threshold = 1, group.size=1 and group.numFound will be equal to the number of documents in the same group. Suppose collapse.threshold = 5, but group.numFound=4 then group.size=4. The group.size is required to read all docs belonging to the same group without having to maintain a set. Let me know if you have suggestions for a better name than these.

          When collapse.threshold is larger than one two collectors are used. I understand that in both situations a different algorithm is used. But now also a search is done twice. Shouldn't it be better to have two complete distinct collectors that don't depend on one another?

          We can have distinct collectors. The CollapsedDocCollector uses some of the data that TopGroupCollector gathers and that is why it uses it directly. We could keep references to the individual objects that are needed too. As I said, this is just a PoC and not the final design.

          I'll give a new patch with the names fixed for both the cases.

          Show
          Shalin Shekhar Mangar added a comment - Shalin, I tried your patch out and I ran into a few problems with sorting and the collapse counts which turned out to be bugs. Thanks Martijn. Though I have a question about the response format. When collapse.threshold is > 1 and more than one documents is collapsed then the collapse.count is named group.size. The field group.numFound is then added as well. Why did you gave it a different name? Actually I intended to rename "collapse.value" to "group.value" and "collapse.count" to "group.numFound" but I forgot to do it in both the places. group.numFound = the total number of documents belonging to this group (i.e. have the same group.value) group.size = the number of documents in this result set belonging to the same group which is equal to min(group.numFound, collapse.threshold) So when collapse.threshold = 1, group.size=1 and group.numFound will be equal to the number of documents in the same group. Suppose collapse.threshold = 5, but group.numFound=4 then group.size=4. The group.size is required to read all docs belonging to the same group without having to maintain a set. Let me know if you have suggestions for a better name than these. When collapse.threshold is larger than one two collectors are used. I understand that in both situations a different algorithm is used. But now also a search is done twice. Shouldn't it be better to have two complete distinct collectors that don't depend on one another? We can have distinct collectors. The CollapsedDocCollector uses some of the data that TopGroupCollector gathers and that is why it uses it directly. We could keep references to the individual objects that are needed too. As I said, this is just a PoC and not the final design. I'll give a new patch with the names fixed for both the cases.
          Hide
          Noble Paul added a comment -

          But now also a search is done twice. Shouldn't it be better to have two complete distinct collectors that don't depend on one another?

          Both the collectors are designed to complement each other so that one can piggyback on other and minimize the code/work

          The field group.numFound is then added as well. Why did you gave it a different name?

          The names are up for debate . Let us reach a consensus on that . When collapse.threshhold=1 collapse.cout/collapse.groupSize can be avoided

          Show
          Noble Paul added a comment - But now also a search is done twice. Shouldn't it be better to have two complete distinct collectors that don't depend on one another? Both the collectors are designed to complement each other so that one can piggyback on other and minimize the code/work The field group.numFound is then added as well. Why did you gave it a different name? The names are up for debate . Let us reach a consensus on that . When collapse.threshhold=1 collapse.cout/collapse.groupSize can be avoided
          Martijn van Groningen made changes -
          Attachment SOLR-1682.patch [ 12429403 ]
          Hide
          Martijn van Groningen added a comment -

          Shalin, I tried your patch out and I ran into a few problems with sorting and the collapse counts which turned out to be bugs.

          1. When I was sorting in ascending order(on a field or score), the order was itself was incorrect.
          2. The collapse count was always one (when threshold=1 which is default was specified). I suppose the count should increment every time a document is collapsed.

          I fixed these issues in the new patch and added tests that show that.

          Though I have a question about the response format. When collapse.threshold is > 1 and more than one documents is collapsed then the collapse.count is named group.size. The field group.numFound is then added as well. Why did you gave it a different name?

          When collapse.threshold is larger than one two collectors are used. I understand that in both situations a different algorithm is used. But now also a search is done twice. Shouldn't it be better to have two complete distinct collectors that don't depend on one another?

          Show
          Martijn van Groningen added a comment - Shalin, I tried your patch out and I ran into a few problems with sorting and the collapse counts which turned out to be bugs. When I was sorting in ascending order(on a field or score), the order was itself was incorrect. The collapse count was always one (when threshold=1 which is default was specified). I suppose the count should increment every time a document is collapsed. I fixed these issues in the new patch and added tests that show that. Though I have a question about the response format. When collapse.threshold is > 1 and more than one documents is collapsed then the collapse.count is named group.size. The field group.numFound is then added as well. Why did you gave it a different name? When collapse.threshold is larger than one two collectors are used. I understand that in both situations a different algorithm is used. But now also a search is done twice. Shouldn't it be better to have two complete distinct collectors that don't depend on one another?
          Shalin Shekhar Mangar made changes -
          Attachment SOLR-236.patch [ 12429131 ]
          Hide
          Shalin Shekhar Mangar added a comment -

          Here's an implementation based on Yonik's suggestion.

          This is just a PoC and not fit to be committed. This implementation uses one pass for collapse.threshold=1 and two passes for collapse.threshold>1 so it should be a lot faster than the previous method. Though, I haven't benchmarked yet. Memory consumption should be proportional to start+count instead of index size.

          What is covered:

          1. Non-adjacent collapsing
          2. collapse.threshold
          3. New response format
          4. Includes DocSetAwareCollector interface from SOLR-1680

          What is not covered:

          1. Adjacent collapsing
          2. Aggregate functions (should be easy to add)
          3. Faceting (it doesn't keep/return the docsets needed for FacetComponent)
          4. Caching
          5. This implementation does not return the correct numFound

          The response adds special fields to only the first document in a group. Here's a sample of the first document in a group:

          <doc>
                <int name="id">1</int>
                <str name="name_s1">author1</str>
                <str name="title_s1">a tree</str>
                <date name="timestamp">2009-12-30T10:16:51.944Z</date>
                <arr name="multiDefault">
                  <str>muLti-Default</str>
                </arr>
                <int name="intDefault">42</int>
                <str name="collapse.value">author1</str>
                <int name="collapse.count">1</int>
                <float name="score">0.67107505</float>
              </doc>
          

          See TestCollapseComponent.java for example usage.

          Show
          Shalin Shekhar Mangar added a comment - Here's an implementation based on Yonik's suggestion . This is just a PoC and not fit to be committed. This implementation uses one pass for collapse.threshold=1 and two passes for collapse.threshold>1 so it should be a lot faster than the previous method. Though, I haven't benchmarked yet. Memory consumption should be proportional to start+count instead of index size. What is covered: Non-adjacent collapsing collapse.threshold New response format Includes DocSetAwareCollector interface from SOLR-1680 What is not covered: Adjacent collapsing Aggregate functions (should be easy to add) Faceting (it doesn't keep/return the docsets needed for FacetComponent) Caching This implementation does not return the correct numFound The response adds special fields to only the first document in a group. Here's a sample of the first document in a group: <doc> <int name= "id" > 1 </int> <str name= "name_s1" > author1 </str> <str name= "title_s1" > a tree </str> <date name= "timestamp" > 2009-12-30T10:16:51.944Z </date> <arr name= "multiDefault" > <str> muLti-Default </str> </arr> <int name= "intDefault" > 42 </int> <str name= "collapse.value" > author1 </str> <int name= "collapse.count" > 1 </int> <float name= "score" > 0.67107505 </float> </doc> See TestCollapseComponent.java for example usage.
          Shalin Shekhar Mangar made changes -
          Assignee Shalin Shekhar Mangar [ shalinmangar ]
          Shalin Shekhar Mangar made changes -
          Summary The field collapse Implement CollapseComponent
          Affects Version/s 1.3 [ 12312486 ]
          Hide
          Martijn van Groningen added a comment -

          Well it is the core functionality without the changes to the core, changes to SolrJ and distributed field collapsing code. So it is not exactly the same.

          Show
          Martijn van Groningen added a comment - Well it is the core functionality without the changes to the core, changes to SolrJ and distributed field collapsing code. So it is not exactly the same.
          Hide
          Shalin Shekhar Mangar added a comment -

          Isn't this issue the same as SOLR-236? It is better to have patches in one place than two. Lets close this one

          Show
          Shalin Shekhar Mangar added a comment - Isn't this issue the same as SOLR-236 ? It is better to have patches in one place than two. Lets close this one
          Martijn van Groningen made changes -
          Field Original Value New Value
          Attachment field-collapsing.patch [ 12428656 ]
          Hide
          Martijn van Groningen added a comment -

          The code taken from the latest patch in SOLR-236.

          Show
          Martijn van Groningen added a comment - The code taken from the latest patch in SOLR-236 .
          Martijn van Groningen created issue -

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Martijn van Groningen
            • Votes:
              7 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development