Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.4, 4.0-ALPHA
    • Component/s: modules/grouping
    • Labels:
      None
    • Lucene Fields:
      Patch Available

      Description

      This issues focuses on implementing post grouping faceting.

      • How to handle multivalued fields. What field value to show with the facet.
      • Where the facet counts should be based on
        • Facet counts can be based on the normal documents. Ungrouped counts.
        • Facet counts can be based on the groups. Grouped counts.
        • Facet counts can be based on the combination of group value and facet value. Matrix counts.

      And properly more implementation options.

      The first two methods are implemented in the SOLR-236 patch. For the first option it calculates a DocSet based on the individual documents from the query result. For the second option it calculates a DocSet for all the most relevant documents of a group. Once the DocSet is computed the FacetComponent and StatsComponent use one the DocSet to create facets and statistics.

      This last one is a bit more complex. I think it is best explained with an example. Lets say we search on travel offers:

      hotel departure_airport duration
      Hotel a AMS 5
      Hotel a DUS 10
      Hotel b AMS 5
      Hotel b AMS 10

      If we group by hotel and have a facet for airport. Most end users expect (according to my experience off course) the following airport facet:
      AMS: 2
      DUS: 1

      The above result can't be achieved by the first two methods. You either get counts AMS:3 and DUS:1 or 1 for both airports.

      1. LUCENE-3097.patch
        50 kB
        Martijn van Groningen
      2. LUCENE-3097.patch
        49 kB
        Martijn van Groningen
      3. LUCENE-3097.patch
        33 kB
        Martijn van Groningen
      4. LUCENE-3097.patch
        26 kB
        Martijn van Groningen
      5. LUCENE-3097.patch
        25 kB
        Martijn van Groningen
      6. LUCENE-30971.patch
        44 kB
        Martijn van Groningen

        Issue Links

          Activity

          Hide
          Martijn van Groningen added a comment -

          Yes, if you're using Solr. You can try to apply the patch it should work for field facets.

          Show
          Martijn van Groningen added a comment - Yes, if you're using Solr. You can try to apply the patch it should work for field facets.
          Hide
          Ian Grainger added a comment -

          Oh, sorry- I just read the previous comment properly - So the case I need fixing is SOLR-2898?

          Show
          Ian Grainger added a comment - Oh, sorry- I just read the previous comment properly - So the case I need fixing is SOLR-2898 ?
          Hide
          Ian Grainger added a comment -

          Hi - is the matrix count feature available in Solr 3.5? Seeing as this is marked as closed I assume it is? If so do I need to do anything to use this feature?

          Show
          Ian Grainger added a comment - Hi - is the matrix count feature available in Solr 3.5? Seeing as this is marked as closed I assume it is? If so do I need to do anything to use this feature?
          Hide
          Martijn van Groningen added a comment -

          The support for real grouped faceting (matrix counts) needs to be added to Solr or faceting module.

          Show
          Martijn van Groningen added a comment - The support for real grouped faceting (matrix counts) needs to be added to Solr or faceting module.
          Hide
          Martijn van Groningen added a comment -

          Well the code that got committed only creates facets for the most relevant document per group. This isn't really grouped facets. To implement this we need to modify Solr's faceting code / facet module code. So I think we can close this one and open a Solr issue to implement grouped facets in Solr (I do have some code for this, but it isn't perfect...) and maybe also an issue to add this to the faceting module

          Show
          Martijn van Groningen added a comment - Well the code that got committed only creates facets for the most relevant document per group. This isn't really grouped facets. To implement this we need to modify Solr's faceting code / facet module code. So I think we can close this one and open a Solr issue to implement grouped facets in Solr (I do have some code for this, but it isn't perfect...) and maybe also an issue to add this to the faceting module
          Hide
          Simon Willnauer added a comment -

          martjin, is this done? seems like you committed to 3.x and trunk. if so can you close/resolve this issue?

          Show
          Simon Willnauer added a comment - martjin, is this done? seems like you committed to 3.x and trunk. if so can you close/resolve this issue?
          Hide
          Bill Bell added a comment -

          Set this to resolved?

          Show
          Bill Bell added a comment - Set this to resolved?
          Hide
          Michael McCandless added a comment -

          Thanks/

          Woops – also the comment in the java code in the package.html still says OpenBitSet!

          Show
          Michael McCandless added a comment - Thanks/ Woops – also the comment in the java code in the package.html still says OpenBitSet!
          Hide
          Martijn van Groningen added a comment -

          I've fixed that. It now references FixedBitSet.

          Show
          Martijn van Groningen added a comment - I've fixed that. It now references FixedBitSet.
          Hide
          Michael McCandless added a comment -

          The package.html still references OpenBitSet here?

          Show
          Michael McCandless added a comment - The package.html still references OpenBitSet here?
          Hide
          Martijn van Groningen added a comment -

          Committed to trunk (rev. 1150470) and 3x branch (rev. 1150472). I'll keep this issue open for future developments.

          Show
          Martijn van Groningen added a comment - Committed to trunk (rev. 1150470) and 3x branch (rev. 1150472). I'll keep this issue open for future developments.
          Hide
          Martijn van Groningen added a comment -

          Updated patch again. I forgot to update some jdocs.

          Show
          Martijn van Groningen added a comment - Updated patch again. I forgot to update some jdocs.
          Hide
          Martijn van Groningen added a comment -

          Updated the patch.

          • Fixed the test failure that Michael reported. The numbits in the bitset was equal to number of ids I set. There fore the Java assertion failed. Obviously the numbits must be one greater than the ids being set.
          • Changed retrieveGroupHeads(maxDoc) return type to FixedBitSet.
          • Moved random tests related to AbstractAllGroupHeadsCollector to TermAllGroupHeadsCollectorTest.
          • During the random tests I ran into some other test failures related to sorting inside a group. I fixed those test failures as well.

          I think the patch is now ready to be committed!

          Show
          Martijn van Groningen added a comment - Updated the patch. Fixed the test failure that Michael reported. The numbits in the bitset was equal to number of ids I set. There fore the Java assertion failed. Obviously the numbits must be one greater than the ids being set. Changed retrieveGroupHeads(maxDoc) return type to FixedBitSet. Moved random tests related to AbstractAllGroupHeadsCollector to TermAllGroupHeadsCollectorTest. During the random tests I ran into some other test failures related to sorting inside a group. I fixed those test failures as well. I think the patch is now ready to be committed!
          Hide
          Michael McCandless added a comment -

          Hmm I hit a test failure w/ this patch:

          ant test -Dtestcase=TermAllGroupHeadsCollectorTest -Dtestmethod=testRetrieveGroupHeadsAsArrayAndOpenBitset -Dtests.seed=-8084704095495262480:-1926953444883897447
          

          Also: can this collector use the new FixedBitSet instead of OpenBitSet...?

          Show
          Michael McCandless added a comment - Hmm I hit a test failure w/ this patch: ant test -Dtestcase=TermAllGroupHeadsCollectorTest -Dtestmethod=testRetrieveGroupHeadsAsArrayAndOpenBitset -Dtests.seed=-8084704095495262480:-1926953444883897447 Also: can this collector use the new FixedBitSet instead of OpenBitSet...?
          Hide
          Martijn van Groningen added a comment -

          Updated the patch.

          • Included the grouping collector into the random test.
          • Added more documentation.

          I think this collector is ready to be committed. This collector implements the second grouping / faceting case that I've described in the issue description.

          Show
          Martijn van Groningen added a comment - Updated the patch. Included the grouping collector into the random test. Added more documentation. I think this collector is ready to be committed. This collector implements the second grouping / faceting case that I've described in the issue description.
          Hide
          Martijn van Groningen added a comment -

          Attached updated patch.

          • Updated to current trunk.
          • Added first test, that prove that the basic functionality is working.

          Things to be done:

          • Hook the collectors into the random test.
          • Some more javadoc
          • Backport to 3x.
          Show
          Martijn van Groningen added a comment - Attached updated patch. Updated to current trunk. Added first test, that prove that the basic functionality is working. Things to be done: Hook the collectors into the random test. Some more javadoc Backport to 3x.
          Hide
          Martijn van Groningen added a comment -

          An updated version of the patch. This is still work in progress.

          I basically rewrote the code in the same way as the other collectors were rewritten for LUCENE-3099.

          Things todo are creating tests and add some more documentation. This patch only covers the second facet / grouping method.

          Show
          Martijn van Groningen added a comment - An updated version of the patch. This is still work in progress. I basically rewrote the code in the same way as the other collectors were rewritten for LUCENE-3099 . Things todo are creating tests and add some more documentation. This patch only covers the second facet / grouping method.
          Hide
          Martijn van Groningen added a comment - - edited

          Also, this patch won't properly count facets if the field ever has multiple values within one group

          That is true. If facet values are different within a group the current collectors in the patch won't notice that.
          For the case Bill is describing that facets work as expected with the current patch.

          But maybe that's fine for the first go.... progress not perfection.

          Definitely! But to continue I think we need the facet module.

          Show
          Martijn van Groningen added a comment - - edited Also, this patch won't properly count facets if the field ever has multiple values within one group That is true. If facet values are different within a group the current collectors in the patch won't notice that. For the case Bill is describing that facets work as expected with the current patch. But maybe that's fine for the first go.... progress not perfection. Definitely! But to continue I think we need the facet module.
          Hide
          Michael McCandless added a comment -

          Also, this patch won't properly count facets if the field ever has multiple values within one group. But maybe that's fine for the first go.... progress not perfection.

          Show
          Michael McCandless added a comment - Also, this patch won't properly count facets if the field ever has multiple values within one group. But maybe that's fine for the first go.... progress not perfection.
          Hide
          Robert Muir added a comment -

          bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - bulk move 3.2 -> 3.3
          Hide
          Martijn van Groningen added a comment -

          OK... This issue seems stalled? Are we waiting on something else?

          For the current attached patch I think that we first need to have the same abstraction as the collectors in LUCENE-3099 have. I think that it can be committed. After that we only need to wire it up in Solr (I'll open a new issue for that). Then we have the same behavior as in SOLR-236 patch with the facet.after option. Don't worry we'll get this soon!

          This patch only support computing the grouped counts. Not the other the other count variant. I think for that we also depend on the faceting module.

          Show
          Martijn van Groningen added a comment - OK... This issue seems stalled? Are we waiting on something else? For the current attached patch I think that we first need to have the same abstraction as the collectors in LUCENE-3099 have. I think that it can be committed. After that we only need to wire it up in Solr (I'll open a new issue for that). Then we have the same behavior as in SOLR-236 patch with the facet.after option. Don't worry we'll get this soon! This patch only support computing the grouped counts. Not the other the other count variant. I think for that we also depend on the faceting module.
          Hide
          Bill Bell added a comment -

          OK... This issue seems stalled? Are we waiting on something else?

          Show
          Bill Bell added a comment - OK... This issue seems stalled? Are we waiting on something else?
          Hide
          Michael McCandless added a comment -

          Right, I think for post-grouping facet counts, the facet counting
          process must be aware of the groups. Within each group, it can only
          count each value (color=red, size=S) once...

          Show
          Michael McCandless added a comment - Right, I think for post-grouping facet counts, the facet counting process must be aware of the groups. Within each group, it can only count each value (color=red, size=S) once...
          Hide
          Bill Bell added a comment - - edited

          One way to do this would be to treat each grouping as unique fields. That would solve both use cases:

          My use case would work for top doc per group, but I can see that the counting looks for "unique values in the field per group". So your example would "look like for counting" for color:

           name=3-wolf shirt
              color=red
              color=blue
          
            name=frog shirt
              color=white
              color=red
          

          color
          red=2, blue=1, white=1

          For size the counting looks like:

          name=3-wolf shirt
              size=M, color=red
              size=S, color=red
              size=L, color=blue
          
            name=frog shirt
              size=M, color=white
              size=S, color=red
          

          size
          M=2, S=2, L=1

          And the facets for size would not change for:

          name=3-wolf shirt
              size=M, color=red
              size=S, color=red
              size=L, color=blue
              size=S, color=blue
              size=S, color=blue
              size=L, color=blue
          
            name=frog shirt
              size=M, color=white
              size=S, color=red
          

          Thanks.

          Show
          Bill Bell added a comment - - edited One way to do this would be to treat each grouping as unique fields. That would solve both use cases: My use case would work for top doc per group, but I can see that the counting looks for "unique values in the field per group". So your example would "look like for counting" for color: name=3-wolf shirt color=red color=blue name=frog shirt color=white color=red color red=2, blue=1, white=1 For size the counting looks like: name=3-wolf shirt size=M, color=red size=S, color=red size=L, color=blue name=frog shirt size=M, color=white size=S, color=red size M=2, S=2, L=1 And the facets for size would not change for: name=3-wolf shirt size=M, color=red size=S, color=red size=L, color=blue size=S, color=blue size=S, color=blue size=L, color=blue name=frog shirt size=M, color=white size=S, color=red Thanks.
          Hide
          Michael McCandless added a comment -

          I think you need to hold the docBase from each setNextReader and re-base your docs stored in the GroupHead?

          I think I'm doing that. If you look at the updateHead() methods. You see that I rebasing the ids.

          Ahh excellent, I missed that. Looks good!

          Once docs within one can have different values for field X then we need a different approach for counting their facets...

          But that would only happen when if an update happen during a search?
          Then all collectors can have this problem, right?

          This is independent of updating during search I think.

          I don't think the existing collectors have a problem here? Ie the
          grouping collectors aren't normally concerned w/ multivalued fields of
          the docs within each group.

          It's only because we intend for these new group collectors to make
          "post-grouping facet counting" work in Solr that we have a problem.
          Ie, these collectors won't properly count facets of fields that have
          different values w/in one group?

          Say this is my original content:

            name=3-wolf shirt
              size=M, color=red
              size=S, color=red
              size=L, color=blue
          
            name=frog shirt
              size=M, color=white
              size=S, color=red
          

          But, I'm not using nested docs (LUCENE-2454), so I had to fully
          denormalize into these docs:

            name=3-wolf shirt, size=M, color=red
            name=3-wolf shirt, size=S, color=red
            name=3-wolf shirt, size=L, color=blue
            name=frog shirt,   size=M, color=white
            name=frog shirt,   size=S, color=red
          

          Now, if user does a search for "color=red"... without post-group
          faceting (ie what Solr has today), you incorrectly see count=3 for
          color=red.

          With post-group faceting, you should see count=2 for color=red (which
          these collectors will do, correctly, I think?), but you should also
          see count=2 for size=S, which I think these collectors will fail to
          do? (Ie, because they only retain the top doc per group...?).

          Show
          Michael McCandless added a comment - I think you need to hold the docBase from each setNextReader and re-base your docs stored in the GroupHead? I think I'm doing that. If you look at the updateHead() methods. You see that I rebasing the ids. Ahh excellent, I missed that. Looks good! Once docs within one can have different values for field X then we need a different approach for counting their facets... But that would only happen when if an update happen during a search? Then all collectors can have this problem, right? This is independent of updating during search I think. I don't think the existing collectors have a problem here? Ie the grouping collectors aren't normally concerned w/ multivalued fields of the docs within each group. It's only because we intend for these new group collectors to make "post-grouping facet counting" work in Solr that we have a problem. Ie, these collectors won't properly count facets of fields that have different values w/in one group? Say this is my original content: name=3-wolf shirt size=M, color=red size=S, color=red size=L, color=blue name=frog shirt size=M, color=white size=S, color=red But, I'm not using nested docs ( LUCENE-2454 ), so I had to fully denormalize into these docs: name=3-wolf shirt, size=M, color=red name=3-wolf shirt, size=S, color=red name=3-wolf shirt, size=L, color=blue name=frog shirt, size=M, color=white name=frog shirt, size=S, color=red Now, if user does a search for "color=red"... without post-group faceting (ie what Solr has today), you incorrectly see count=3 for color=red. With post-group faceting, you should see count=2 for color=red (which these collectors will do, correctly, I think?), but you should also see count=2 for size=S, which I think these collectors will fail to do? (Ie, because they only retain the top doc per group...?).
          Hide
          Martijn van Groningen added a comment -

          I think create() needs to be fixed to handle other SortField types? Eg, INT, FLOAT?

          Oops I forgot. We need to use the general impl for that.

          I think you need to hold the docBase from each setNextReader and re-base your docs stored in the GroupHead?

          I think I'm doing that. If you look at the updateHead() methods. You see that I rebasing the ids.

          Once docs within one can have different values for field X then we need a different approach for counting their facets...

          But that would only happen when if an update happen during a search? Then all collectors can have this problem, right?

          Show
          Martijn van Groningen added a comment - I think create() needs to be fixed to handle other SortField types? Eg, INT, FLOAT? Oops I forgot. We need to use the general impl for that. I think you need to hold the docBase from each setNextReader and re-base your docs stored in the GroupHead? I think I'm doing that. If you look at the updateHead() methods. You see that I rebasing the ids. Once docs within one can have different values for field X then we need a different approach for counting their facets... But that would only happen when if an update happen during a search? Then all collectors can have this problem, right?
          Hide
          Michael McCandless added a comment -

          Patch looks good Martijn! A few small things:

          • I think create() needs to be fixed to handle other SortField
            types? Eg, INT, FLOAT?
          • I think you need to hold the docBase from each setNextReader and
            re-base your docs stored in the GroupHead? Because when you
            retrieve them in the end you return them as top-level docIDs.

          This would really benefit from the random test in TestGrouping

          This can indeed help with post-facet counting, but I think only on
          fields whose value is constant within the group? (Ie, because we pick
          only the "head" doc, as long as the head doc is guaranteed to have the
          same value for field X, it's safe to use that doc to represent the
          entire group for facet counting).

          Once docs within one can have different values for field X then we
          need a different approach for counting their facets...

          Show
          Michael McCandless added a comment - Patch looks good Martijn! A few small things: I think create() needs to be fixed to handle other SortField types? Eg, INT, FLOAT? I think you need to hold the docBase from each setNextReader and re-base your docs stored in the GroupHead? Because when you retrieve them in the end you return them as top-level docIDs. This would really benefit from the random test in TestGrouping This can indeed help with post-facet counting, but I think only on fields whose value is constant within the group? (Ie, because we pick only the "head" doc, as long as the head doc is guaranteed to have the same value for field X, it's safe to use that doc to represent the entire group for facet counting). Once docs within one can have different values for field X then we need a different approach for counting their facets...
          Hide
          Martijn van Groningen added a comment -

          Attached an initial patch with a collector that collects the most relevant documents for each group that match the query.

          This collector can be used to create facets based on grouped counts. Actually the collector has many implementations for different situations. For example when the group sort within the groups is only score or fields. There is a general implementation that works for all sorts (e.g. a function).

          Just as in the caching collector there is a factory method that selects the most efficient collector based on the group sort.

          TODO:

          • Add tests
          • Clean up code / jdoc

          Feedback welcome!

          Show
          Martijn van Groningen added a comment - Attached an initial patch with a collector that collects the most relevant documents for each group that match the query. This collector can be used to create facets based on grouped counts. Actually the collector has many implementations for different situations. For example when the group sort within the groups is only score or fields. There is a general implementation that works for all sorts (e.g. a function). Just as in the caching collector there is a factory method that selects the most efficient collector based on the group sort. TODO: Add tests Clean up code / jdoc Feedback welcome!
          Hide
          Simon Willnauer added a comment -

          Martjin, you should assigne this issue to you to make sure its not moved to version 3.3

          Show
          Simon Willnauer added a comment - Martjin, you should assigne this issue to you to make sure its not moved to version 3.3
          Hide
          Michael McCandless added a comment -

          Right, this'd mean all docs sharing a given group value are contiguous and in the same segment. The app would have to ensure this, in order to use a collector that takes advantage of it.

          Show
          Michael McCandless added a comment - Right, this'd mean all docs sharing a given group value are contiguous and in the same segment. The app would have to ensure this, in order to use a collector that takes advantage of it.
          Hide
          Martijn van Groningen added a comment -

          Ie, we just have to insure, at indexing time, that docs within the same "group" are adjacent, if you want to be able to count by unique group values.

          This means that in the same group also need to be in the same segment, right? Or if we use this mechanism for faceting documents with the same facet need to be in the same segment??? If that is true, it would make the collectors easier. The SentinelIntSet we use in the collectors is not necessary, because we can lookup the norm from the DocIndexTerms. We won't find the same group in a different segment. On the other hand with scalability in mind would make it complex. Since documents with the in the same group need to be in the same segment. Which makes indexing complex.

          Show
          Martijn van Groningen added a comment - Ie, we just have to insure, at indexing time, that docs within the same "group" are adjacent, if you want to be able to count by unique group values. This means that in the same group also need to be in the same segment, right? Or if we use this mechanism for faceting documents with the same facet need to be in the same segment??? If that is true, it would make the collectors easier. The SentinelIntSet we use in the collectors is not necessary, because we can lookup the norm from the DocIndexTerms. We won't find the same group in a different segment. On the other hand with scalability in mind would make it complex. Since documents with the in the same group need to be in the same segment. Which makes indexing complex.
          Hide
          Michael McCandless added a comment -

          In fact, I think a very efficient way to implement post-group faceting is something like LUCENE-2454.

          Ie, we just have to insure, at indexing time, that docs within the same "group" are adjacent, if you want to be able to count by unique group values.

          Hmm... but I think this (what your "identifier" field is, for facet counting purposes) should be decoupled from how you group. I may group by State, for presentation purposes, but count facets by doctor_id.

          Show
          Michael McCandless added a comment - In fact, I think a very efficient way to implement post-group faceting is something like LUCENE-2454 . Ie, we just have to insure, at indexing time, that docs within the same "group" are adjacent, if you want to be able to count by unique group values. Hmm... but I think this (what your "identifier" field is, for facet counting purposes) should be decoupled from how you group. I may group by State, for presentation purposes, but count facets by doctor_id.
          Hide
          Michael McCandless added a comment -

          Right, gender in this example was single-valued per group.

          Another way to visualize / define how post-group faceting should behave is: imagine for ever facet value (ie field + value) you could define an aggregator. Today, that aggregator is just the count of how many docs had that value from the full result set. But you could, instead define it to be "count(distinct(doctor_id))", and then you'll get the group counts you want. (Other aggregators are conceivable – max(relevance), min+max(prices), etc.).

          Conceptually I think this also defines the post-group faceting functionality, even if we would never implement it this way (ie count(distinct(doctor_id)) would be way too costly to do naively).

          Show
          Michael McCandless added a comment - Right, gender in this example was single-valued per group. Another way to visualize / define how post-group faceting should behave is: imagine for ever facet value (ie field + value) you could define an aggregator. Today, that aggregator is just the count of how many docs had that value from the full result set. But you could, instead define it to be "count(distinct(doctor_id))", and then you'll get the group counts you want. (Other aggregators are conceivable – max(relevance), min+max(prices), etc.). Conceptually I think this also defines the post-group faceting functionality, even if we would never implement it this way (ie count(distinct(doctor_id)) would be way too costly to do naively).
          Hide
          Martijn van Groningen added a comment -

          If I say, facet.field=gender I would expect:

          I think this can be achieved by basing the facet counts on the normal documents. Ungrouped counts.

          If we had Spatial, and I had lat long for each address, I would expect if I say sort=geodist() asc that it would group and then find the closest
          point for each grouping to return in the proper order. For example, if I was at 103 E 5th St, I would expect the output for doctorid=1 to be:

          This just depends on the sort / group sort you provide. I think this should already work in the Solr trunk.

          If I only need the 1st point in the grouping I would expect the other points to be omitted.

          This depends on the group limit you provide in the request.

          Show
          Martijn van Groningen added a comment - If I say, facet.field=gender I would expect: I think this can be achieved by basing the facet counts on the normal documents. Ungrouped counts. If we had Spatial, and I had lat long for each address, I would expect if I say sort=geodist() asc that it would group and then find the closest point for each grouping to return in the proper order. For example, if I was at 103 E 5th St, I would expect the output for doctorid=1 to be: This just depends on the sort / group sort you provide. I think this should already work in the Solr trunk. If I only need the 1st point in the grouping I would expect the other points to be omitted. This depends on the group limit you provide in the request.
          Hide
          Michael McCandless added a comment -

          Thanks for the example Bill – that makes sense!

          I think, in general, the post-group faceting should act "as if" you had indexed a single document per group, with multi-valued fields containing the union of all field values within that group, and then done "normal" faceting. I believe this defines the semantics we are after for post-grouping faceting.

          Show
          Michael McCandless added a comment - Thanks for the example Bill – that makes sense! I think, in general, the post-group faceting should act "as if" you had indexed a single document per group, with multi-valued fields containing the union of all field values within that group, and then done "normal" faceting. I believe this defines the semantics we are after for post-grouping faceting.
          Hide
          Bill Bell added a comment -

          Here is another example...

          Doctors have multiple offices. I want to store doctorid, doctor's name, gender (male/female), and office address as separate rows. Then I want to group by doctorid. I only want the one doctor. I then want to facet by gender and see the numbers after it is grouped. I also want the total rows to be after grouping.

          doctorid, doctor's name, gender, address
          1, Bill Bell, male, 55 east main St
          1, Bill Bell, male, 103 E 5th St
          2, Sue Jones, female, 67 W 97th St
          2, Sue Jones, female, 888 O'West St
          3, Toby Williams, male, 8 Vale St
          4, Margie Youth, female, 5 E Medical Center
          4, Margie Youth, female, 98456 E Rose St

          I would expect the grouping to return:

          total rows = 7
          group total rows = 4
          group_by
          1,
          Bill Bell, male, 55 east main St
          Bill Bell, male, 103 E 5th St
          2,
          Sue Jones, female, 67 W 97th St
          Sue Jones, female, 888 O'West St
          3,
          Toby Williams, male, 8 Vale St
          4,
          Margie Youth, female, 5 E Medical Center
          Margie Youth, female, 98456 E Rose St

          I would expect if I say, rows=2, start=0, order by doctorid, I would get:

          1,
          Bill Bell, male, 55 east main St
          Bill Bell, male, 103 E 5th St
          2,
          Sue Jones, female, 67 W 97th St
          Sue Jones, female, 888 O'West St

          If I say, facet.field=gender I would expect:

          male: 2 (Bill Bell, Toby Williams)
          female: 2 (Sue Jones, Margie Youth)

          If we had Spatial, and I had lat long for each address, I would expect if I say sort=geodist() asc that it would group and then find the closest
          point for each grouping to return in the proper order. For example, if I was at 103 E 5th St, I would expect the output for doctorid=1 to be:

          group_by
          1,
          Bill Bell, male, 103 E 5th St
          Bill Bell, male, 55 east main St

          If I only need the 1st point in the grouping I would expect the other points to be omitted.

          group_by
          1,
          Bill Bell, male, 103 E 5th St
          2,
          Sue Jones, female, 67 W 97th St
          3,
          Toby Williams, male, 8 Vale St
          4,
          Margie Youth, female, 5 E Medical Center

          Thanks.

          Show
          Bill Bell added a comment - Here is another example... Doctors have multiple offices. I want to store doctorid, doctor's name, gender (male/female), and office address as separate rows. Then I want to group by doctorid. I only want the one doctor. I then want to facet by gender and see the numbers after it is grouped. I also want the total rows to be after grouping. doctorid, doctor's name, gender, address 1, Bill Bell, male, 55 east main St 1, Bill Bell, male, 103 E 5th St 2, Sue Jones, female, 67 W 97th St 2, Sue Jones, female, 888 O'West St 3, Toby Williams, male, 8 Vale St 4, Margie Youth, female, 5 E Medical Center 4, Margie Youth, female, 98456 E Rose St I would expect the grouping to return: total rows = 7 group total rows = 4 group_by 1, Bill Bell, male, 55 east main St Bill Bell, male, 103 E 5th St 2, Sue Jones, female, 67 W 97th St Sue Jones, female, 888 O'West St 3, Toby Williams, male, 8 Vale St 4, Margie Youth, female, 5 E Medical Center Margie Youth, female, 98456 E Rose St I would expect if I say, rows=2, start=0, order by doctorid, I would get: 1, Bill Bell, male, 55 east main St Bill Bell, male, 103 E 5th St 2, Sue Jones, female, 67 W 97th St Sue Jones, female, 888 O'West St If I say, facet.field=gender I would expect: male: 2 (Bill Bell, Toby Williams) female: 2 (Sue Jones, Margie Youth) If we had Spatial, and I had lat long for each address, I would expect if I say sort=geodist() asc that it would group and then find the closest point for each grouping to return in the proper order. For example, if I was at 103 E 5th St, I would expect the output for doctorid=1 to be: group_by 1, Bill Bell, male, 103 E 5th St Bill Bell, male, 55 east main St If I only need the 1st point in the grouping I would expect the other points to be omitted. group_by 1, Bill Bell, male, 103 E 5th St 2, Sue Jones, female, 67 W 97th St 3, Toby Williams, male, 8 Vale St 4, Margie Youth, female, 5 E Medical Center Thanks.
          Hide
          Michael McCandless added a comment -

          If we group by hotel and have a facet for airport. Most end users expect (according to my experience off course) the following airport facet:

          +1, I think that semantics is intuitive. It treats each group as a doc w/ multi-valued field whose values are unioned from all docs within it. So group "Hotel a" has values AMS, DUS for the departure_airport field.

          Show
          Michael McCandless added a comment - If we group by hotel and have a facet for airport. Most end users expect (according to my experience off course) the following airport facet: +1, I think that semantics is intuitive. It treats each group as a doc w/ multi-valued field whose values are unioned from all docs within it. So group "Hotel a" has values AMS, DUS for the departure_airport field.

            People

            • Assignee:
              Martijn van Groningen
              Reporter:
              Martijn van Groningen
            • Votes:
              5 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development