Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Support grouped faceting. As described in LUCENE-3097 (matrix counts).

      1. SOLR-2898.patch
        11 kB
        Martijn van Groningen
      2. SOLR-2898.patch
        41 kB
        Martijn van Groningen
      3. SOLR-2898.patch
        17 kB
        Martijn van Groningen
      4. SOLR-2898.patch
        16 kB
        Martijn van Groningen

        Issue Links

          Activity

          Hide
          Martijn van Groningen added a comment -

          Attached initial patch that supports rudimentary grouped field facets for single valued and non tokenized string fields. Grouped facets isn't yet implemented for query / range and pivot facets.

          This patch is compatible with trunk. To use it for all field facets use group.facet=true or specify it per field. See test in patch for more details.

          I just hacked some code in the SimpleFacets class. To support it for all types of facets will require a lot of changes in many places in this class. Currently I don't see another way...

          Show
          Martijn van Groningen added a comment - Attached initial patch that supports rudimentary grouped field facets for single valued and non tokenized string fields. Grouped facets isn't yet implemented for query / range and pivot facets. This patch is compatible with trunk. To use it for all field facets use group.facet=true or specify it per field. See test in patch for more details. I just hacked some code in the SimpleFacets class. To support it for all types of facets will require a lot of changes in many places in this class. Currently I don't see another way...
          Hide
          Ian Grainger added a comment - - edited

          A thought on this problem - I have a workaround which involves separate facet queries for the grouped facets using a different sort (so matching grouped records are always first in the group if available).

          I don't know how this works internally, but might it be possible to add a sort to the facet query? Like:

          facet.query={!sort=Monitor_ID+asc}Monitor_ID:[1+TO+9999]

          ?

          Show
          Ian Grainger added a comment - - edited A thought on this problem - I have a workaround which involves separate facet queries for the grouped facets using a different sort (so matching grouped records are always first in the group if available). I don't know how this works internally, but might it be possible to add a sort to the facet query? Like: facet.query={!sort=Monitor_ID+asc}Monitor_ID: [1+TO+9999] ?
          Hide
          Martijn van Groningen added a comment -

          I don't completely follow. Lets say we group by Monitor_Model_ID (Monitor_ID would then be the unique id?) and facet by Brand how would facet.query help us the get grouped facets? As far as I know facets are independent of each other.

          Show
          Martijn van Groningen added a comment - I don't completely follow. Lets say we group by Monitor_Model_ID (Monitor_ID would then be the unique id?) and facet by Brand how would facet.query help us the get grouped facets? As far as I know facets are independent of each other.
          Hide
          Ian Grainger added a comment -

          In my example I was grouping by some other field - so say I group by 'Company_ID' and then I want to find companies with low monitor IDs - as long as I'm using group.truncate (which I happen to be) I only want to find a group which has at least one matching value - so by using a sort by monitor ID ASC, if the group has any low monitor IDs they'll be the selected ones, and the facet counts will therefore be correct.

          I realise this doesn't fix the problem properly - but I was wondering if it was possible to use as a workaround (a specific sort per facet) as I don't know how it works internally.

          Show
          Ian Grainger added a comment - In my example I was grouping by some other field - so say I group by 'Company_ID' and then I want to find companies with low monitor IDs - as long as I'm using group.truncate (which I happen to be) I only want to find a group which has at least one matching value - so by using a sort by monitor ID ASC, if the group has any low monitor IDs they'll be the selected ones, and the facet counts will therefore be correct. I realise this doesn't fix the problem properly - but I was wondering if it was possible to use as a workaround (a specific sort per facet) as I don't know how it works internally.
          Hide
          Okke Klein added a comment -

          Patch works well. Would it also work for facet.date, or do I have to wait for support for range queries?

          Show
          Okke Klein added a comment - Patch works well. Would it also work for facet.date, or do I have to wait for support for range queries?
          Hide
          Martijn van Groningen added a comment -

          Nice to know that the patch works out in your environment! The current patch doesn't support range, date and pivot facet.
          The grouped facet is only supports facet.field for string based fields that are not tokenized and are not multivalued.

          Show
          Martijn van Groningen added a comment - Nice to know that the patch works out in your environment! The current patch doesn't support range, date and pivot facet. The grouped facet is only supports facet.field for string based fields that are not tokenized and are not multivalued.
          Hide
          Steven Heijtel added a comment -

          I tried the patch, and it does work, thanks for that. Is there any expectations when this feature will be released and/or when range queries are expected? At least I've voted for it

          Something that took me ages to understand, I would like to share with others. In contradiction to non grouped facets (sum of all facets) == (total of products with that property) is not the case if it is grouped. See the following objects:

          object 1

          • name: Phaser 4620a
          • ppm: 62
          • product_range: 6

          object 2

          • name: Phaser 4620i
          • ppm: 65
          • product_range: 6

          object 3

          • name: ML6512
          • ppm: 62
          • product_range: 7

          If I ask to group it on "product_range", then the total amount of groups is 2, but the facets for ppm are:

          62 --> 2
          65 --> 1

          It completely makes sense, but in my case quite difficult to find out.

          Show
          Steven Heijtel added a comment - I tried the patch, and it does work, thanks for that. Is there any expectations when this feature will be released and/or when range queries are expected? At least I've voted for it Something that took me ages to understand, I would like to share with others. In contradiction to non grouped facets (sum of all facets) == (total of products with that property) is not the case if it is grouped. See the following objects: object 1 name: Phaser 4620a ppm: 62 product_range: 6 object 2 name: Phaser 4620i ppm: 65 product_range: 6 object 3 name: ML6512 ppm: 62 product_range: 7 If I ask to group it on "product_range", then the total amount of groups is 2, but the facets for ppm are: 62 --> 2 65 --> 1 It completely makes sense, but in my case quite difficult to find out.
          Hide
          Martijn van Groningen added a comment -

          I tried the patch, and it does work, thanks for that. Is there any expectations when this feature will be released and/or when range queries are expected? At least I've voted for it

          Nice! The idea I had in mind was to support grouped facets for all facet types and methods. However this requires a lot of changes in the current code (SimpleFacets class) and I think the code becomes even more complex then it already is. I was thinking about creating GroupedFacets class and then step-by-step support more facet types with grouping. But this is just an idea.

          The counts makes sense, but sometimes difficult too understand. We have 2 groups (6, 7) and we have two distinct ppm values (62, 65). Value 62 occurs in both groups and value 65 in one group. So the counts are: 62=2 and 65=1.

          Show
          Martijn van Groningen added a comment - I tried the patch, and it does work, thanks for that. Is there any expectations when this feature will be released and/or when range queries are expected? At least I've voted for it Nice! The idea I had in mind was to support grouped facets for all facet types and methods. However this requires a lot of changes in the current code (SimpleFacets class) and I think the code becomes even more complex then it already is. I was thinking about creating GroupedFacets class and then step-by-step support more facet types with grouping. But this is just an idea. The counts makes sense, but sometimes difficult too understand. We have 2 groups (6, 7) and we have two distinct ppm values (62, 65). Value 62 occurs in both groups and value 65 in one group. So the counts are: 62=2 and 65=1.
          Hide
          Mark Desira added a comment - - edited

          Hi I am new to SOLR and was looking for (what seems to be) the function/patch in question.

          My scenario is this:

          ID : ProductName : ProductCategory : Colour
          ---------------------------------------------------
          1 : BatmanTShirt : T-Shirt : Black
          2 : BatmanTShirt : T-Shirt : Blue
          3 : SupermanTShirt : T-Shirt : Blue
          4 : SpidermanTrousers : Trousers : Red
          5 : SpidermanTrousers : Trousers : Black

          If I use the usual faceting (on ProductCategory) in SOLR, I would get the following results:

          T-Shirt (3)
          Trousers (2)

          However what I want the facets to look like is the following:

          T-Shirt (2)
          Trousers (1)

          Meaning that I don't want the colour to generate (3) counts for T-Shirt and (2) counts for Trousers.
          I know that I can 'normalize' the document to hold multi-valued field for the colours however that would complicate my system a bit because I have other fields (such as 'Price' and 'Size') to include in the multi-valued field.

          What I would like to do is to sort of GROUP BY ProductName and ProductCategory, this would 'flatten' the rows to just 3; (2) for T-Shirt and (1) for Trousers; and then I apply the facet, resulting in what I require

          From what I'm understanding, this patch would work great for me am I right?
          If so, could you point me to some tutorial on how can I apply this patch?

          Thanks.

          P.S. Sorry for issuing multiple edits for this comment but I was attempting to make the 'Table' text monospaced, sadly without any success. (As you can see I'm new to wikis as well)

          Show
          Mark Desira added a comment - - edited Hi I am new to SOLR and was looking for (what seems to be) the function/patch in question. My scenario is this: ID : ProductName : ProductCategory : Colour --------------------------------------------------- 1 : BatmanTShirt : T-Shirt : Black 2 : BatmanTShirt : T-Shirt : Blue 3 : SupermanTShirt : T-Shirt : Blue 4 : SpidermanTrousers : Trousers : Red 5 : SpidermanTrousers : Trousers : Black If I use the usual faceting (on ProductCategory) in SOLR, I would get the following results: T-Shirt (3) Trousers (2) However what I want the facets to look like is the following: T-Shirt (2) Trousers (1) Meaning that I don't want the colour to generate (3) counts for T-Shirt and (2) counts for Trousers. I know that I can 'normalize' the document to hold multi-valued field for the colours however that would complicate my system a bit because I have other fields (such as 'Price' and 'Size') to include in the multi-valued field. What I would like to do is to sort of GROUP BY ProductName and ProductCategory, this would 'flatten' the rows to just 3; (2) for T-Shirt and (1) for Trousers; and then I apply the facet, resulting in what I require From what I'm understanding, this patch would work great for me am I right? If so, could you point me to some tutorial on how can I apply this patch? Thanks. P.S. Sorry for issuing multiple edits for this comment but I was attempting to make the 'Table' text monospaced, sadly without any success. (As you can see I'm new to wikis as well)
          Hide
          Steven Heijtel added a comment -

          @martijn
          I've tried to understand the source code, but I really have no clue how it works. So I cannot help you with that. But I'll try to give you some moral support

          @mark
          Just use grouping on "ProductName", and then using this patch you get the couts you want as far as I understand. Patching is not that difficult; just patch the source code with `patch -p0 < patchfile` and after that use this manual: http://dev.modmancer.com/index.php/2010/04/28/setting-up-solr-from-nightly-builds-svn/

          Show
          Steven Heijtel added a comment - @martijn I've tried to understand the source code, but I really have no clue how it works. So I cannot help you with that. But I'll try to give you some moral support @mark Just use grouping on "ProductName", and then using this patch you get the couts you want as far as I understand. Patching is not that difficult; just patch the source code with `patch -p0 < patchfile` and after that use this manual: http://dev.modmancer.com/index.php/2010/04/28/setting-up-solr-from-nightly-builds-svn/
          Hide
          Mark Desira added a comment -

          Thanks Steven I will attempt that as soon as I am back.

          Show
          Mark Desira added a comment - Thanks Steven I will attempt that as soon as I am back.
          Hide
          Mark Desira added a comment -

          I have applied this patch as instructed and it seems to have worked. Just to clear one thing, does the faceting work according to the first group.field listed in the query string?

          Show
          Mark Desira added a comment - I have applied this patch as instructed and it seems to have worked. Just to clear one thing, does the faceting work according to the first group.field listed in the query string?
          Hide
          Martijn van Groningen added a comment -

          Yes, the grouped faceting works with the first group.field parameter. Other group.field parameters are ignored.

          Show
          Martijn van Groningen added a comment - Yes, the grouped faceting works with the first group.field parameter. Other group.field parameters are ignored.
          Hide
          Mark Desira added a comment -

          Thanks Martijn. One thing I noticed is that the 'post-group-faceting' is active in both cases when group.facet parameter is TRUE and also when FALSE. As long as the Group=TRUE parameter is present, post-group-faceting becomes active, even when the group.facet parameter is not specified. Is this behaviour as expected, or am I missing something?

          Show
          Mark Desira added a comment - Thanks Martijn. One thing I noticed is that the 'post-group-faceting' is active in both cases when group.facet parameter is TRUE and also when FALSE. As long as the Group=TRUE parameter is present, post-group-faceting becomes active, even when the group.facet parameter is not specified. Is this behaviour as expected, or am I missing something?
          Hide
          Martijn van Groningen added a comment -

          No that shouldn't be the case. It should also check the group.facet parameter. I'll fix that.

          Show
          Martijn van Groningen added a comment - No that shouldn't be the case. It should also check the group.facet parameter. I'll fix that.
          Hide
          Mark Desira added a comment -

          Cool thanks let me know.

          Show
          Mark Desira added a comment - Cool thanks let me know.
          Hide
          Martijn van Groningen added a comment -

          Updated patch to work with the latest version of trunk and added the group.facet parameter the I forgot to add in the previous patch.

          Show
          Martijn van Groningen added a comment - Updated patch to work with the latest version of trunk and added the group.facet parameter the I forgot to add in the previous patch.
          Hide
          Mark Desira added a comment -

          Hi Martijn, is the added group.facet the check for facet.group.after parameter?

          Show
          Mark Desira added a comment - Hi Martijn, is the added group.facet the check for facet.group.after parameter?
          Hide
          Martijn van Groningen added a comment -

          Attached a new patch that takes another approach. Patch contains the TermGroupFacetCollector class that performs the grouped faceting. Grouped faceting is now per segment instead of top level. Grouped faceting works the same as in the previous patch.

          I think this approach is cleaner and grouped faceting isn't just a hack any more.

          The TermGroupFacetCollector collector is located in the grouping module. I'll open a new Lucene issue with the purpose of getting this collector committed to the grouping module. This issue will then depend on the new Lucene issue.

          Show
          Martijn van Groningen added a comment - Attached a new patch that takes another approach. Patch contains the TermGroupFacetCollector class that performs the grouped faceting. Grouped faceting is now per segment instead of top level. Grouped faceting works the same as in the previous patch. I think this approach is cleaner and grouped faceting isn't just a hack any more. The TermGroupFacetCollector collector is located in the grouping module. I'll open a new Lucene issue with the purpose of getting this collector committed to the grouping module. This issue will then depend on the new Lucene issue.
          Hide
          Martijn van Groningen added a comment -

          Hi Martijn, is the added group.facet the check for facet.group.after parameter?

          Better late than never... Are you referring to the uncommitted FieldCollapse collapse.facet parameter? I think that isn't completely the same as this grouped faceting is trying to achieve. Take a look at LUCENE-3097 (matrix facets).

          Show
          Martijn van Groningen added a comment - Hi Martijn, is the added group.facet the check for facet.group.after parameter? Better late than never... Are you referring to the uncommitted FieldCollapse collapse.facet parameter? I think that isn't completely the same as this grouped faceting is trying to achieve. Take a look at LUCENE-3097 (matrix facets).
          Hide
          Martijn van Groningen added a comment -

          Updated patch to latest changes in LUCENE-3802. I think it is time to commit this. Grouped faceting based on range and query facets will be addressed in a different issue.

          Show
          Martijn van Groningen added a comment - Updated patch to latest changes in LUCENE-3802 . I think it is time to commit this. Grouped faceting based on range and query facets will be addressed in a different issue.
          Hide
          Martijn van Groningen added a comment -

          Committed to trunk.

          Show
          Martijn van Groningen added a comment - Committed to trunk.
          Hide
          Bjorn Hijmans added a comment -

          Hi Martijn, any idea when range and query facets will be added?

          Show
          Bjorn Hijmans added a comment - Hi Martijn, any idea when range and query facets will be added?
          Hide
          David Boychuck added a comment -

          I am using Solr 3.6 and am trying to use grouping on products which can have many SKU's. The problem I am facing is that when I use group.truncate I am no longer able to use the statsComponent to get the high and low price ranges. Would it be an easy fix to only allow truncate exclusions so that some facets would be truncated and others would not?

          Show
          David Boychuck added a comment - I am using Solr 3.6 and am trying to use grouping on products which can have many SKU's. The problem I am facing is that when I use group.truncate I am no longer able to use the statsComponent to get the high and low price ranges. Would it be an easy fix to only allow truncate exclusions so that some facets would be truncated and others would not?
          Hide
          Martijn van Groningen added a comment - - edited

          Bjorn: Right now I don't have an idea when this will be added. If you want you can make a new Jira issue to support grouped range and query facets.

          David: The FacetComponent and StatsComponent operate on a DocSet computed by the QueryComponent. The QueryComponent creates either a ungrouped DocSet or a grouped DocSet when group.truncate is enabled. So instead of creating one DocSet, the QueryComponent then needs to generate two DocSets (grouped & ungrouped), so that the StatsComponent can operate on the ungrouped DocSet when group.truncate is enabled and the FacetComponent can operate on the grouped DocSet. This should be controlled with a special parameter. Maybe something like this: group.computeStats=GROUPED|UNGROUPED

          Show
          Martijn van Groningen added a comment - - edited Bjorn: Right now I don't have an idea when this will be added. If you want you can make a new Jira issue to support grouped range and query facets. David: The FacetComponent and StatsComponent operate on a DocSet computed by the QueryComponent. The QueryComponent creates either a ungrouped DocSet or a grouped DocSet when group.truncate is enabled. So instead of creating one DocSet, the QueryComponent then needs to generate two DocSets (grouped & ungrouped), so that the StatsComponent can operate on the ungrouped DocSet when group.truncate is enabled and the FacetComponent can operate on the grouped DocSet. This should be controlled with a special parameter. Maybe something like this: group.computeStats=GROUPED|UNGROUPED
          Hide
          David Boychuck added a comment - - edited

          I just realized that computing stats on an ungrouped docset still wouldn't work since I still need to do query facets on price ranges. I have created issue SOLR-3406 to address the problems described with not having the ability to use facet.query over a grouped docset.

          Martijn: This functionality is critical to my implementation of Solr and I would like to help develop a solution. You mention in a previous post

          The idea I had in mind was to support grouped facets for all facet types and methods. However this requires a lot of changes in the current code (SimpleFacets class) and I think the code becomes even more complex then it already is. I was thinking about creating GroupedFacets class and then step-by-step support more facet types with grouping. But this is just an idea.

          Have you started on GroupedFacets? I don't see it in the trunk but wasn't sure if you had started something locally. I do see a GroupedFacetHit.class commited to trunk but I'm not sure if that is related.

          At any rate I am new to Solr development. Do you think you could point me in the right direction or give me your vision of how you see this implementation happening.

          Also to anybody reading this post that would like to see this feature implemented please vote for issue SOLR-3406

          Show
          David Boychuck added a comment - - edited I just realized that computing stats on an ungrouped docset still wouldn't work since I still need to do query facets on price ranges. I have created issue SOLR-3406 to address the problems described with not having the ability to use facet.query over a grouped docset. Martijn: This functionality is critical to my implementation of Solr and I would like to help develop a solution. You mention in a previous post The idea I had in mind was to support grouped facets for all facet types and methods. However this requires a lot of changes in the current code (SimpleFacets class) and I think the code becomes even more complex then it already is. I was thinking about creating GroupedFacets class and then step-by-step support more facet types with grouping. But this is just an idea. Have you started on GroupedFacets? I don't see it in the trunk but wasn't sure if you had started something locally. I do see a GroupedFacetHit.class commited to trunk but I'm not sure if that is related. At any rate I am new to Solr development. Do you think you could point me in the right direction or give me your vision of how you see this implementation happening. Also to anybody reading this post that would like to see this feature implemented please vote for issue SOLR-3406
          Hide
          Martijn van Groningen added a comment -

          At any rate I am new to Solr development. Do you think you could point me in the right direction or give me your vision of how you see this implementation happening.

          Sure. Lets do this in SOLR-3406

          Show
          Martijn van Groningen added a comment - At any rate I am new to Solr development. Do you think you could point me in the right direction or give me your vision of how you see this implementation happening. Sure. Lets do this in SOLR-3406
          Hide
          David Boychuck added a comment -

          ok sounds good

          Show
          David Boychuck added a comment - ok sounds good

            People

            • Assignee:
              Unassigned
              Reporter:
              Martijn van Groningen
            • Votes:
              7 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development