Solr
  1. Solr
  2. SOLR-792

Pivot (ie: Decision Tree) Faceting Component

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      A component to do multi-level faceting.

      1. SOLR-792.patch
        8 kB
        Yonik Seeley
      2. SOLR-792.patch
        10 kB
        Erik Hatcher
      3. SOLR-792.patch
        11 kB
        Thibaut Lassalle
      4. SOLR-792.patch
        10 kB
        Jeremy Hinegardner
      5. SOLR-792.patch
        10 kB
        Jeremy Hinegardner
      6. SOLR-792.patch
        6 kB
        Erik Hatcher
      7. SOLR-792.patch
        5 kB
        Erik Hatcher
      8. SOLR-792-as-helper-class.patch
        27 kB
        Ryan McKinley
      9. SOLR-792-distributed.patch
        4 kB
        Dan Cooper
      10. SOLR-792-PivotFaceting.patch
        28 kB
        Ryan McKinley
      11. SOLR-792-PivotFaceting.patch
        25 kB
        Ryan McKinley
      12. SOLR-792-PivotFaceting.patch
        25 kB
        Ryan McKinley
      13. SOLR-792-PivotFaceting.patch
        21 kB
        Ryan McKinley
      14. SOLR-792-raw-type.patch
        3 kB
        Ryan McKinley

        Issue Links

          Activity

          Hide
          Antoine Le Floc'h added a comment -

          distributed pivot faceting

          Show
          Antoine Le Floc'h added a comment - distributed pivot faceting
          Hide
          Erik Hatcher added a comment -

          opening a new issue for distributed support

          Show
          Erik Hatcher added a comment - opening a new issue for distributed support
          Hide
          Mark Miller added a comment -

          Perhaps we should close this and open another issue for the implementation of distributed pivoting?

          +1

          Show
          Mark Miller added a comment - Perhaps we should close this and open another issue for the implementation of distributed pivoting? +1
          Hide
          Dan Cooper added a comment -

          I have submitted a patch for distributed pivoting above, however I have since realised that while it works well on the small sample data it does not scale well. This is due to the use of NamedList objects for storing and merging the pivoted data.

          I have since improved the code to use maps whilst merging the pivots and it runs well across our large indexes. I will submit this again in a few days when I have had time to review and package up the code properly (there is some proprietary code I need to separate it from).

          My company relies on this functionality and we have it working with Solr 4 so we would like to contribute it back to the community if possible.

          Show
          Dan Cooper added a comment - I have submitted a patch for distributed pivoting above, however I have since realised that while it works well on the small sample data it does not scale well. This is due to the use of NamedList objects for storing and merging the pivoted data. I have since improved the code to use maps whilst merging the pivots and it runs well across our large indexes. I will submit this again in a few days when I have had time to review and package up the code properly (there is some proprietary code I need to separate it from). My company relies on this functionality and we have it working with Solr 4 so we would like to contribute it back to the community if possible.
          Hide
          Jan Høydahl added a comment -

          +1 to closing and opening a new jira for distributed support (as we have done with other features in the past)

          Show
          Jan Høydahl added a comment - +1 to closing and opening a new jira for distributed support (as we have done with other features in the past)
          Hide
          Erik Hatcher added a comment -

          Why isn't this resolved yet?

          Ryan said:

          it does not (yet) support distributed. An early patch did – but I don't have any experiece with making stuff distributed, so that part is waiting for contributions....

          Perhaps we should close this and open another issue for the implementation of distributed pivoting?

          Show
          Erik Hatcher added a comment - Why isn't this resolved yet? Ryan said: it does not (yet) support distributed. An early patch did – but I don't have any experiece with making stuff distributed, so that part is waiting for contributions.... Perhaps we should close this and open another issue for the implementation of distributed pivoting?
          Hide
          Jan Høydahl added a comment -

          Why isn't this resolved yet?

          Show
          Jan Høydahl added a comment - Why isn't this resolved yet?
          Hide
          Li Fanxi added a comment - - edited

          I have a question about facet.pivot.mincount. Does this parameter defined as the limitation for "minimum number of documents" that should be included in the result?

          In the current implementation, I found that this parameter also takes effect for the number of facets fields, caused by the following code in doPivots function:

              NamedList<Integer> nl = sf.getTermCounts(subField);
              if (nl.size() >= minMatch ) {
                  pivot.add( "pivot", doPivots( nl, subField, nextField, fnames, rb, subset, minMatch ) );
                  values.add( pivot ); // only add response if there are some counts
              }
          

          I don't understand why we need to compare minMatch to nl.size().

          With this code, if we do pivot faceting on the fields "cat,manu_id_s", when cat='electronics' and we have 50 documents distributed in 3 different "manu_id_s". If we limit the result with facet.pivot.mincount=5, no result will be returned, because there are only 3 different "manu_id_s". Is this the desired behavior for the "facet.pivot.mincount" parameter?

          If this is not the desired behavior, what behavior it should be? My suggestion is to filter the results only based on document count, and return both the actual number and the number of document has been filtered by mincount parameter to the caller, and user can get the raw data and define the behavior by themselves.

          Show
          Li Fanxi added a comment - - edited I have a question about facet.pivot.mincount. Does this parameter defined as the limitation for "minimum number of documents" that should be included in the result? In the current implementation, I found that this parameter also takes effect for the number of facets fields, caused by the following code in doPivots function: NamedList< Integer > nl = sf.getTermCounts(subField); if (nl.size() >= minMatch ) { pivot.add( "pivot" , doPivots( nl, subField, nextField, fnames, rb, subset, minMatch ) ); values.add( pivot ); // only add response if there are some counts } I don't understand why we need to compare minMatch to nl.size(). With this code, if we do pivot faceting on the fields "cat,manu_id_s", when cat='electronics' and we have 50 documents distributed in 3 different "manu_id_s". If we limit the result with facet.pivot.mincount=5, no result will be returned, because there are only 3 different "manu_id_s". Is this the desired behavior for the "facet.pivot.mincount" parameter? If this is not the desired behavior, what behavior it should be? My suggestion is to filter the results only based on document count, and return both the actual number and the number of document has been filtered by mincount parameter to the caller, and user can get the raw data and define the behavior by themselves.
          Hide
          Dan Cooper added a comment -

          My project requires distributed facet pivot so I have written a patch to do this and attached it to the issue.

          The patch affects only FacetComponent and I have tested it against Solr 4.0 trunk. The latest revision number I have used is 1179956.

          It seems to work fine from my limited testing, although I haven't written any unit tests to check the functionality.

          As each shard already returns a pivot facet it's just a case of merging the pivots together as per the other facets.

          I hope this patch is useful.

          Show
          Dan Cooper added a comment - My project requires distributed facet pivot so I have written a patch to do this and attached it to the issue. The patch affects only FacetComponent and I have tested it against Solr 4.0 trunk. The latest revision number I have used is 1179956. It seems to work fine from my limited testing, although I haven't written any unit tests to check the functionality. As each shard already returns a pivot facet it's just a case of merging the pivots together as per the other facets. I hope this patch is useful.
          Hide
          Ryan McKinley added a comment -

          it does not (yet) support distributed. An early patch did – but I don't have any experiece with making stuff distributed, so that part is waiting for contributions....

          Show
          Ryan McKinley added a comment - it does not (yet) support distributed. An early patch did – but I don't have any experiece with making stuff distributed, so that part is waiting for contributions....
          Hide
          Mark Miller added a comment -

          I don't know the answer to that - I assume it doesn't yet? But I would consider this a bug myself in any case - the none distrib components should throw a nice error if used in distrib.

          Show
          Mark Miller added a comment - I don't know the answer to that - I assume it doesn't yet? But I would consider this a bug myself in any case - the none distrib components should throw a nice error if used in distrib.
          Hide
          Upayavira added a comment -

          Is this supposed to support distributed search? I just tried it (using a recent trunk) - worked nicely across a single index but showed no entries across two shards.

          Show
          Upayavira added a comment - Is this supposed to support distributed search? I just tried it (using a recent trunk) - worked nicely across a single index but showed no entries across two shards.
          Hide
          Fuad Efendi added a comment -

          Hi,

          Jason Folk posted:

          facet.tree currently seems to bark at exclusion tags, I wouldn't mind trying to take a crack at this (as I currently do need it), but not really sure where to begin looking.

          Is it resolved? My client currently uses "pivot" in production, few mlns records

          If it's not resolved yet I can dig into it...

          Show
          Fuad Efendi added a comment - Hi, Jason Folk posted: facet.tree currently seems to bark at exclusion tags, I wouldn't mind trying to take a crack at this (as I currently do need it), but not really sure where to begin looking. Is it resolved? My client currently uses "pivot" in production, few mlns records If it's not resolved yet I can dig into it...
          Hide
          Adeel Qureshi added a comment -

          is this pivot faceting supposed to work with date fields .. i tried it and with a date field and it didnt returned any pivot results for that date field .. I am assuming even if it works it still requires more information like start, end, gap field .. but there is no mention of how to pass that information to this new pivot stuff ..

          Show
          Adeel Qureshi added a comment - is this pivot faceting supposed to work with date fields .. i tried it and with a date field and it didnt returned any pivot results for that date field .. I am assuming even if it works it still requires more information like start, end, gap field .. but there is no mention of how to pass that information to this new pivot stuff ..
          Hide
          Toke Eskildsen added a comment -

          I'd be interested to hear what the focus of SOLR-792 is, as opposed to SOLR-64. Or to put it another way: If SOLR-64 was adapted to accept a list of fields for the hierarchy, what would the purpose of SOLR-792 be?

          Show
          Toke Eskildsen added a comment - I'd be interested to hear what the focus of SOLR-792 is, as opposed to SOLR-64 . Or to put it another way: If SOLR-64 was adapted to accept a list of fields for the hierarchy, what would the purpose of SOLR-792 be?
          Hide
          Peter Karich added a comment -

          Hi Grant,

          ah, ok I see. Thanks for the explanation!

          Show
          Peter Karich added a comment - Hi Grant, ah, ok I see. Thanks for the explanation!
          Hide
          Grant Ingersoll added a comment -

          Hi Peter,

          I like to think of it as "What if" faceting and doesn't require the categories to be defined up front. You can solve this through hierarchical faceting, too, but this (pivot) approach doesn't require a traditional relationship description like hierarchical faceting does.

          Show
          Grant Ingersoll added a comment - Hi Peter, I like to think of it as "What if" faceting and doesn't require the categories to be defined up front. You can solve this through hierarchical faceting, too, but this (pivot) approach doesn't require a traditional relationship description like hierarchical faceting does.
          Hide
          Peter Karich added a comment -

          Hi Toke and all,

          maybe I am a bit evil or stupid but could someone enlight me why this patch is necessary?

          Why can't you we the existing mechanisms in Solr (facets!) and a bit logic while indexing:

          http://markmail.org/message/2aza6nnsiw3l4bbb#query:+page:1+mid:3j3ttojacpjoyfg5+state:results

          This has no performance problems when using tons of categories. We already using it with lots of categories. It works out of the box with a nearly infinity depth (either you need a DB ->
          unlimited or the URL length is the limit).

          The only drawback of this approach is that you won't be able to display two or more 'branches' at the same time. Only one current branch with the current possible categories is possible, which is no limitation in our case. Because the UI would be unusable if too many items would be visible at the same time.

          One could introduce a special update component for this feature which uses a category tree (in RAM) built from the json or xml definition. I could create such a component if someone is interested.

          Regards,
          Peter.

          Show
          Peter Karich added a comment - Hi Toke and all, maybe I am a bit evil or stupid but could someone enlight me why this patch is necessary? Why can't you we the existing mechanisms in Solr (facets!) and a bit logic while indexing: http://markmail.org/message/2aza6nnsiw3l4bbb#query:+page:1+mid:3j3ttojacpjoyfg5+state:results This has no performance problems when using tons of categories. We already using it with lots of categories. It works out of the box with a nearly infinity depth (either you need a DB -> unlimited or the URL length is the limit). The only drawback of this approach is that you won't be able to display two or more 'branches' at the same time. Only one current branch with the current possible categories is possible, which is no limitation in our case. Because the UI would be unusable if too many items would be visible at the same time. One could introduce a special update component for this feature which uses a category tree (in RAM) built from the json or xml definition. I could create such a component if someone is interested. Regards, Peter.
          Hide
          Toke Eskildsen added a comment -

          The current interface does not allow for nested queries. It is my understanding that this limits the functionality to conventional hierarchical faceting with the slight twist that the counts are for the current level instead of current level + sub levels, but that should be attainable with conventional hierarchical faceting too. This makes current pivot faceting a sub-set of SOLR-64, provided that SOLR-64 is adjusted to accept a list of fields as building blocks instead of expressing the hierarchy in a single field with delimiters. This is a good thing. It means that it can be done fast and memory-efficient as well as sharing most of the interface and output format with SOLR-64.

          Now, if something like nested queries is introduced in the pivot faceting interface, this changes the requirements of the underlying code as a complete recount is needed for each level. One evil nested query could be "Select the documents where field X contains the last letter of the current tag plus the first letter of the original query". This makes it hard (I try and avoid using the word "impossible") to create an implementation without query-explosion.

          So where am I going with all this? My point is that the interface (of course) dictates how responsive the implementation can be. Focusing on interfaces and using small-scale test data does carry a risk of ending up with something that is inherently slow. It might be unfeasible to attain high scalability with a given interface addition and that is okay - as long as that cost is known and accepted. Hence my questions about scale and my musings about how to do it faster.

          Show
          Toke Eskildsen added a comment - The current interface does not allow for nested queries. It is my understanding that this limits the functionality to conventional hierarchical faceting with the slight twist that the counts are for the current level instead of current level + sub levels, but that should be attainable with conventional hierarchical faceting too. This makes current pivot faceting a sub-set of SOLR-64 , provided that SOLR-64 is adjusted to accept a list of fields as building blocks instead of expressing the hierarchy in a single field with delimiters. This is a good thing. It means that it can be done fast and memory-efficient as well as sharing most of the interface and output format with SOLR-64 . Now, if something like nested queries is introduced in the pivot faceting interface, this changes the requirements of the underlying code as a complete recount is needed for each level. One evil nested query could be "Select the documents where field X contains the last letter of the current tag plus the first letter of the original query". This makes it hard (I try and avoid using the word "impossible") to create an implementation without query-explosion. So where am I going with all this? My point is that the interface (of course) dictates how responsive the implementation can be. Focusing on interfaces and using small-scale test data does carry a risk of ending up with something that is inherently slow. It might be unfeasible to attain high scalability with a given interface addition and that is okay - as long as that cost is known and accepted. Hence my questions about scale and my musings about how to do it faster.
          Hide
          Ryan McKinley added a comment -

          I like: FieldType.toObject(SchemaField sf, BytesRef indexedTerm)

          Down the road, i think we want something similar for DocValues

          Show
          Ryan McKinley added a comment - I like: FieldType.toObject(SchemaField sf, BytesRef indexedTerm) Down the road, i think we want something similar for DocValues
          Hide
          Yonik Seeley added a comment -

          Here's a draft patch that adds FieldType.toObject(SchemaField sf, BytesRef indexedTerm)
          and then uses that in the facet pivot code.

          Show
          Yonik Seeley added a comment - Here's a draft patch that adds FieldType.toObject(SchemaField sf, BytesRef indexedTerm) and then uses that in the facet pivot code.
          Hide
          Ryan McKinley added a comment -

          thanks yonik – i just got back from a conference... I can look at this later if you need.

          Toke – re performance improvements? Yes, there are lots of places this could be improved (patches always welcome!) – the goal in this first implementation was to get something working with an interface (HTTP) that we like.

          Show
          Ryan McKinley added a comment - thanks yonik – i just got back from a conference... I can look at this later if you need. Toke – re performance improvements? Yes, there are lots of places this could be improved (patches always welcome!) – the goal in this first implementation was to get something working with an interface (HTTP) that we like.
          Hide
          Yonik Seeley added a comment -

          I really only reviewed the output format, figuring any performance deficiencies could be addressed later.

          Show
          Yonik Seeley added a comment - I really only reviewed the output format, figuring any performance deficiencies could be addressed later.
          Hide
          Toke Eskildsen added a comment -

          As I read the code, the implementation performs a faceting call for each tag that it encounters. I know the query is simple and that filters are used to speed up the calls, but is still sounds awfully expensive to me. I think it can be done without the recursive faceting calls by creating packed representations (using ordinals) of the tags in the pivot fields for all documents and doing faceting on those. However, if the current implementation works fine for larger data sets, there's no reason to reworking it. Has anyone performed scalability testing on the current implementation?

          Show
          Toke Eskildsen added a comment - As I read the code, the implementation performs a faceting call for each tag that it encounters. I know the query is simple and that filters are used to speed up the calls, but is still sounds awfully expensive to me. I think it can be done without the recursive faceting calls by creating packed representations (using ordinals) of the tags in the pivot fields for all documents and doing faceting on those. However, if the current implementation works fine for larger data sets, there's no reason to reworking it. Has anyone performed scalability testing on the current implementation?
          Hide
          Yonik Seeley added a comment -

          I can reproduce the error fuad is seeing with this:
          http://localhost:8983/solr/select?q=*:*&facet=true&facet.pivot=popularity,manu_exact

                  <str name="field">popularity</str>
                  <str name="value">ERROR:SCHEMA-INDEX-MISMATCH,stringValue=`#8;#0;#0;#0;#6;</str>
          
          Show
          Yonik Seeley added a comment - I can reproduce the error fuad is seeing with this: http://localhost:8983/solr/select?q=*:*&facet=true&facet.pivot=popularity,manu_exact <str name= "field" >popularity</str> <str name= "value" >ERROR:SCHEMA-INDEX-MISMATCH,stringValue=`#8;#0;#0;#0;#6;</str>
          Hide
          Ryan McKinley added a comment -

          I think keeping a single FacetComponent, but making it easier to build custom ones is a good idea. Right now SimpleFacets is rather complex and could be broken into many classes.

          Show
          Ryan McKinley added a comment - I think keeping a single FacetComponent, but making it easier to build custom ones is a good idea. Right now SimpleFacets is rather complex and could be broken into many classes.
          Hide
          Yonik Seeley added a comment -

          Ability to discreetly enable/disable them by removing them from the "components" list (ie: maybe you need facet.query and facet.range but you don't want facet.field and facet.pivot to be available because of the performance impacts)

          This use case doesn't make a lot of sense to me - requests given to solr should just work. We're not handling security/authorization at the Solr level for the most part. The exception to this is (e)dismax, which is explicitly meant for passing through a raw user query.

          I personally think that 10 different facet components would be ugly, and at the end of the day, doesn't really help solve anyone's real problems.

          ability to see distinct timing data from each of them independently

          This is just a debugging issue. Even if you could separate facet.field from facet.query, if you had multiple facet.fields, you still wouldn't know which one is taking up all the time.

          Show
          Yonik Seeley added a comment - Ability to discreetly enable/disable them by removing them from the "components" list (ie: maybe you need facet.query and facet.range but you don't want facet.field and facet.pivot to be available because of the performance impacts) This use case doesn't make a lot of sense to me - requests given to solr should just work. We're not handling security/authorization at the Solr level for the most part. The exception to this is (e)dismax, which is explicitly meant for passing through a raw user query. I personally think that 10 different facet components would be ugly, and at the end of the day, doesn't really help solve anyone's real problems. ability to see distinct timing data from each of them independently This is just a debugging issue. Even if you could separate facet.field from facet.query, if you had multiple facet.fields, you still wouldn't know which one is taking up all the time.
          Hide
          Hoss Man added a comment -

          If we keep it as a separate component, then it puts an additional burden on people to remember to configure it, and figure out where to put it - before or after the "normal" facet component. And if we add it as a default component that is always there, then stuff like debugging output, etc, will list this separate component.

          I'm not sure I see a benefit to this being a "user visible" component.

          As far as i'm concerned, all of the things you listed are valuable reasons why this should be a user visible component – they are also reasons why i think we should try to refactor the existing FacetComponent into separate components:

          • Ability to discreetly enable/disable them by removing them from the "components" list (ie: maybe you need facet.query and facet.range but you don't want facet.field and facet.pivot to be available because of the performance impacts)
          • ability to see distinct timing data from each of them independently

          ..refactoring the existing component should be a separate Jira, but since this work was already done to keep the pivot code isolated, combining it now seems like a bad idea (unless i'm the only person who things these should all be distinct, user visible, components)

          Show
          Hoss Man added a comment - If we keep it as a separate component, then it puts an additional burden on people to remember to configure it, and figure out where to put it - before or after the "normal" facet component. And if we add it as a default component that is always there, then stuff like debugging output, etc, will list this separate component. I'm not sure I see a benefit to this being a "user visible" component. As far as i'm concerned, all of the things you listed are valuable reasons why this should be a user visible component – they are also reasons why i think we should try to refactor the existing FacetComponent into separate components: Ability to discreetly enable/disable them by removing them from the "components" list (ie: maybe you need facet.query and facet.range but you don't want facet.field and facet.pivot to be available because of the performance impacts) ability to see distinct timing data from each of them independently ..refactoring the existing component should be a separate Jira, but since this work was already done to keep the pivot code isolated, combining it now seems like a bad idea (unless i'm the only person who things these should all be distinct, user visible, components)
          Hide
          Ryan McKinley added a comment -

          Here is a patch that moves the pivot functionality to a helper class for the FacetComponet.

          One bug I found was that although SearchComponents implement NamedListInitalizedPlugin, the default components never have init called (so far this has been OK since the standard components don't use init)

          This does not support distributed search, but I kept some stuff commented out that may be helpful for someone who understands it (and can easily test.

          I will commit shortly...

          Show
          Ryan McKinley added a comment - Here is a patch that moves the pivot functionality to a helper class for the FacetComponet. One bug I found was that although SearchComponents implement NamedListInitalizedPlugin, the default components never have init called (so far this has been OK since the standard components don't use init) This does not support distributed search, but I kept some stuff commented out that may be helpful for someone who understands it (and can easily test. I will commit shortly...
          Hide
          Ryan McKinley added a comment -

          Re separate component

          I'm fine merging the behavior into FacetComponent and keeping the logic in a different class (PivotFacetHelper?) – I'll take a crack at that, but will save any distributed stuff for someone else.

          Show
          Ryan McKinley added a comment - Re separate component I'm fine merging the behavior into FacetComponent and keeping the logic in a different class (PivotFacetHelper?) – I'll take a crack at that, but will save any distributed stuff for someone else.
          Hide
          Ryan McKinley added a comment -

          This patch converts to the field value to an Object rather then just using the string value.

          String internal = ftype.toInternal( kv.getKey() );

          then later:

          pivot.add( "value", ftype.toObject( f ) );

          This could obviously be optimized, but this is the easiest way to get the behavior we want

          Show
          Ryan McKinley added a comment - This patch converts to the field value to an Object rather then just using the string value. String internal = ftype.toInternal( kv.getKey() ); then later: pivot.add( "value", ftype.toObject( f ) ); This could obviously be optimized, but this is the easiest way to get the behavior we want
          Hide
          Grant Ingersoll added a comment -

          Regarding separate components: I think it's enough to be separate classes. But that's really more of an internal code organization thing. The important part is that people be able to do "facet=true&facet.pivot=..."

          Most won't care about SimpleFacets.

          +1. I don't see a need for a separate component, but do agree that a separate class makes sense.

          Show
          Grant Ingersoll added a comment - Regarding separate components: I think it's enough to be separate classes. But that's really more of an internal code organization thing. The important part is that people be able to do "facet=true&facet.pivot=..." Most won't care about SimpleFacets. +1. I don't see a need for a separate component, but do agree that a separate class makes sense.
          Hide
          Yonik Seeley added a comment -

          Shouldn't this be a part of the default SearchComponent chain?

          I think so.

          Regarding separate components: I think it's enough to be separate classes. But that's really more of an internal code organization thing. The important part is that people be able to do "facet=true&facet.pivot=..."
          Most won't care about SimpleFacets.

          If we keep it as a separate component, then it puts an additional burden on people to remember to configure it, and figure out where to put it - before or after the "normal" facet component. And if we add it as a default component that is always there, then stuff like debugging output, etc, will list this separate component.

          I'm not sure I see a benefit to this being a "user visible" component.

          Show
          Yonik Seeley added a comment - Shouldn't this be a part of the default SearchComponent chain? I think so. Regarding separate components: I think it's enough to be separate classes. But that's really more of an internal code organization thing. The important part is that people be able to do "facet=true&facet.pivot=..." Most won't care about SimpleFacets. If we keep it as a separate component, then it puts an additional burden on people to remember to configure it, and figure out where to put it - before or after the "normal" facet component. And if we add it as a default component that is always there, then stuff like debugging output, etc, will list this separate component. I'm not sure I see a benefit to this being a "user visible" component.
          Hide
          Grant Ingersoll added a comment -

          Shouldn't this be a part of the default SearchComponent chain? I seem to recall our general guideline was that if it didn't require any extra setup (i.e. like spell checking) that it should just be a part of the chain.

          Show
          Grant Ingersoll added a comment - Shouldn't this be a part of the default SearchComponent chain? I seem to recall our general guideline was that if it didn't require any extra setup (i.e. like spell checking) that it should just be a part of the chain.
          Hide
          Ryan McKinley added a comment -

          yonik – I just added some simple docs... sorry for the delay.

          re native types yes, it would be great to have the native type. It is currently a string because that is the interface exposed by SimpleFacets

          NamedList<Integer> f = SimpleFacets.getTermCounts(field);
          

          for a native list, we would want something like:

          List<KeyValuePair<Object,Integer>> f = = SimpleFacets.getNativeTermCounts(field);
          

          or convert the string back to a value as it is added to the result? I'm not sure the best approach.

          re separate component I think "SimpleFacets" are no longer "Simple" and we should make an effort to break that into more components rather then one massive one.

          re mincount default=1 we could make the default 0 if people think that makes more sense

          re deque / java6 This could easily be changed to use a non java 6 interface. patches welcome. But as Hoss said 1.5 will not likely be released, though this should be applied to 3.x at some point.

          Show
          Ryan McKinley added a comment - yonik – I just added some simple docs... sorry for the delay. re native types yes, it would be great to have the native type. It is currently a string because that is the interface exposed by SimpleFacets NamedList< Integer > f = SimpleFacets.getTermCounts(field); for a native list, we would want something like: List<KeyValuePair< Object , Integer >> f = = SimpleFacets.getNativeTermCounts(field); or convert the string back to a value as it is added to the result? I'm not sure the best approach. re separate component I think "SimpleFacets" are no longer "Simple" and we should make an effort to break that into more components rather then one massive one. re mincount default=1 we could make the default 0 if people think that makes more sense re deque / java6 This could easily be changed to use a non java 6 interface. patches welcome. But as Hoss said 1.5 will not likely be released, though this should be applied to 3.x at some point.
          Hide
          Hoss Man added a comment -

          I'm just curious why this needs to be a full blown search component.

          Yonik: I believe this was largely motivated by various discussions about the fact that FacetComponent is getting unweildy and should really be split up into discreet SearchComponents for each type of faceting.

          What are the system requirements of SOLR 1.5 going to be?

          Christian: Solr 1.5 will probably never exist. This patch was committed to trunk which will eventually be Solr 4.0 and will definitely require Java 1.6...

          http://wiki.apache.org/solr/Solr1.5
          http://wiki.apache.org/solr/Solr3.1
          http://wiki.apache.org/solr/Solr4.0

          Show
          Hoss Man added a comment - I'm just curious why this needs to be a full blown search component. Yonik: I believe this was largely motivated by various discussions about the fact that FacetComponent is getting unweildy and should really be split up into discreet SearchComponents for each type of faceting. What are the system requirements of SOLR 1.5 going to be? Christian: Solr 1.5 will probably never exist. This patch was committed to trunk which will eventually be Solr 4.0 and will definitely require Java 1.6... http://wiki.apache.org/solr/Solr1.5 http://wiki.apache.org/solr/Solr3.1 http://wiki.apache.org/solr/Solr4.0
          Hide
          Christian Kesselheim added a comment -

          java.util.Deque has been introduced as part of Java SE 6.0. As a result, applying this patch effectively renders SOLR 1.5 unusable on any earlier version of Java (e.g. 1.5) .

          Is that by design? What are the system requirements of SOLR 1.5 going to be?

          Show
          Christian Kesselheim added a comment - java.util.Deque has been introduced as part of Java SE 6.0. As a result, applying this patch effectively renders SOLR 1.5 unusable on any earlier version of Java (e.g. 1.5) . Is that by design? What are the system requirements of SOLR 1.5 going to be?
          Hide
          Yonik Seeley added a comment -

          Keeping the facet pivot code in it's own classes is good - but I'm just curious why this needs to be a full blown search component. It even puts it's output right where the facet component does (i.e. under facet_counts).
          Although I'm also happy that the addition of this component doesn't add yet-another empty "facet_xxx" to facet_counts when it's not being used.

          Show
          Yonik Seeley added a comment - Keeping the facet pivot code in it's own classes is good - but I'm just curious why this needs to be a full blown search component. It even puts it's output right where the facet component does (i.e. under facet_counts). Although I'm also happy that the addition of this component doesn't add yet-another empty "facet_xxx" to facet_counts when it's not being used.
          Hide
          Hoss Man added a comment -

          Jira summary update based on the consensus of what this type of functionality should be called

          Show
          Hoss Man added a comment - Jira summary update based on the consensus of what this type of functionality should be called
          Hide
          Yonik Seeley added a comment -

          One thing I noticed is that the "value" is always a string. Example: "value":"6" as opposed to "value":6 when pivoting by popularity.

          Result grouping on the other hand, does use the native value type:
          http://localhost:8983/solr/select?q=*:*&group=true&group.field=popularity

          One way to think about it is that the labels for faceting normally use string values. But that's because they must for something like JSON. A different way of thinking about it is that whenever we have values (as opposed to keys) we should use "native" types boolean, int, float, etc.

          Thoughts?

          Show
          Yonik Seeley added a comment - One thing I noticed is that the "value" is always a string. Example: "value":"6" as opposed to "value":6 when pivoting by popularity. Result grouping on the other hand, does use the native value type: http://localhost:8983/solr/select?q=*:*&group=true&group.field=popularity One way to think about it is that the labels for faceting normally use string values. But that's because they must for something like JSON. A different way of thinking about it is that whenever we have values (as opposed to keys) we should use "native" types boolean, int, float, etc. Thoughts?
          Hide
          Yonik Seeley added a comment -

          Hey guys - I was planning on sticking this into my "Lucene Revolution" presentation... but I'm not seeing any docs on it.
          Could someone take a shot at adding a section on pivot faceting to http://wiki.apache.org/solr/SimpleFacetParameters

          Show
          Yonik Seeley added a comment - Hey guys - I was planning on sticking this into my "Lucene Revolution" presentation... but I'm not seeing any docs on it. Could someone take a shot at adding a section on pivot faceting to http://wiki.apache.org/solr/SimpleFacetParameters
          Hide
          Fuad Efendi added a comment -

          Default value (as seen in a code) is
          facet.pivot.mincount=1

          It confused me during simple tests (showing wrong results). Finally I found I need to add explicitly
          &facet.pivot.mincount=0

          Show
          Fuad Efendi added a comment - Default value (as seen in a code) is facet.pivot.mincount=1 It confused me during simple tests (showing wrong results). Finally I found I need to add explicitly &facet.pivot.mincount=0
          Hide
          Yonik Seeley added a comment -

          1.4.x is for bugfixes only.

          Show
          Yonik Seeley added a comment - 1.4.x is for bugfixes only.
          Hide
          Ryan McKinley added a comment -

          It can be back ported easily, I don't quite know the ropes on when and how stuff is backported. Since this is a new API, I though it would be good to let it settle in trunk before porting it to 1.4.1...

          Show
          Ryan McKinley added a comment - It can be back ported easily, I don't quite know the ropes on when and how stuff is backported. Since this is a new API, I though it would be good to let it settle in trunk before porting it to 1.4.1...
          Hide
          Lance Norskog added a comment -

          Can this be back-ported (easily) to Solr 1.4.1? Is it dependent on new features?

          Show
          Lance Norskog added a comment - Can this be back-ported (easily) to Solr 1.4.1? Is it dependent on new features?
          Hide
          Ryan McKinley added a comment -

          Thanks Erik – i'll commit this to /trunk soon and we can patch against that for distributed support

          Show
          Ryan McKinley added a comment - Thanks Erik – i'll commit this to /trunk soon and we can patch against that for distributed support
          Hide
          Erik Hatcher added a comment -

          handing this one over to Ryan, as I don't have cycles to work on it anytime soon. Rock on Ryan...

          Show
          Erik Hatcher added a comment - handing this one over to Ryan, as I don't have cycles to work on it anytime soon. Rock on Ryan...
          Hide
          Ryan McKinley added a comment -

          updated to trunk

          Show
          Ryan McKinley added a comment - updated to trunk
          Hide
          Ryan McKinley added a comment -

          Updated with (slightly) better javadocs and moved the Queue to a Dequeue.

          I think this (or something similar) should get committed to /trunk soon... when distributed search gets implemented, then we should look at porting to 3.x

          Show
          Ryan McKinley added a comment - Updated with (slightly) better javadocs and moved the Queue to a Dequeue. I think this (or something similar) should get committed to /trunk soon... when distributed search gets implemented, then we should look at porting to 3.x
          Hide
          Ryan McKinley added a comment -

          Here is a patch that allows for deeper nesting and more info is stored. This creates output that looks like:

          http://localhost:8983/solr/select?q=*:*&facet.pivot=cat,popularity,inStock&facet.pivot=popularity,cat&facet=true&facet.field=cat&facet.limit=5&rows=0&wt=json&indent=true&facet.pivot.mincount=0

              "facet_pivot":{
                "cat,popularity,inStock":[{
                    "field":"cat",
                    "value":"electronics",
                    "count":17,
                    "pivot":[{
                        "field":"popularity",
                        "value":"6",
                        "count":5,
                        "pivot":[{
                            "field":"inStock",
                            "value":"true",
                            "count":5}]},
                      {
                        "field":"popularity",
                        "value":"7",
                        "count":5,
                        "pivot":[{
                            "field":"inStock",
                            "value":"true",
                            "count":3},
                          {
                            "field":"inStock",
                            "value":"false",
                            "count":2}]},
                      {
          ...
          

          and:

          <lst name="facet_pivot">
              <arr name="cat,popularity,inStock">
                <lst>
                  <str name="field">cat</str>
                  <str name="value">electronics</str>
                  <int name="count">17</int>
          
                  <arr name="pivot">
                    <lst>
                      <str name="field">popularity</str>
                      <str name="value">6</str>
                      <int name="count">5</int>
                      <arr name="pivot">
                        <lst>
          
                          <str name="field">inStock</str>
                          <str name="value">true</str>
                          <int name="count">5</int>
                        </lst>
                      </arr>
                    </lst>
                    <lst>
          
                      <str name="field">popularity</str>
                      <str name="value">7</str>
                      <int name="count">5</int>
                      <arr name="pivot">
                        <lst>
                          <str name="field">inStock</str>
                          <str name="value">true</str>
          
                          <int name="count">3</int>
                        </lst>
                        <lst>
                          <str name="field">inStock</str>
                          <str name="value">false</str>
                          <int name="count">2</int>
                        </lst>
          
                      </arr>
                    </lst>
          ...
          

          This patch still needs some work to make it work with distributed search

          Show
          Ryan McKinley added a comment - Here is a patch that allows for deeper nesting and more info is stored. This creates output that looks like: http://localhost:8983/solr/select?q=*:*&facet.pivot=cat,popularity,inStock&facet.pivot=popularity,cat&facet=true&facet.field=cat&facet.limit=5&rows=0&wt=json&indent=true&facet.pivot.mincount=0 "facet_pivot":{ "cat,popularity,inStock":[{ "field":"cat", "value":"electronics", "count":17, "pivot":[{ "field":"popularity", "value":"6", "count":5, "pivot":[{ "field":"inStock", "value":" true ", "count":5}]}, { "field":"popularity", "value":"7", "count":5, "pivot":[{ "field":"inStock", "value":" true ", "count":3}, { "field":"inStock", "value":" false ", "count":2}]}, { ... and: <lst name= "facet_pivot" > <arr name= "cat,popularity,inStock" > <lst> <str name= "field" > cat </str> <str name= "value" > electronics </str> <int name= "count" > 17 </int> <arr name= "pivot" > <lst> <str name= "field" > popularity </str> <str name= "value" > 6 </str> <int name= "count" > 5 </int> <arr name= "pivot" > <lst> <str name= "field" > inStock </str> <str name= "value" > true </str> <int name= "count" > 5 </int> </lst> </arr> </lst> <lst> <str name= "field" > popularity </str> <str name= "value" > 7 </str> <int name= "count" > 5 </int> <arr name= "pivot" > <lst> <str name= "field" > inStock </str> <str name= "value" > true </str> <int name= "count" > 3 </int> </lst> <lst> <str name= "field" > inStock </str> <str name= "value" > false </str> <int name= "count" > 2 </int> </lst> </arr> </lst> ... This patch still needs some work to make it work with distributed search
          Hide
          Hoss Man added a comment -

          1) ...

          Moves the parameter defines to FacetParams.java

          i'm not sure where they were in the original patch, but i really think they should be in their own PivotFacetParams class (just as having this be a distinct SearchComponent from the existing bloated FacetComponet is nice and keeps things manageable, having the params in a separate class is also nice ... i hope to get around to refactoring FacetComponent into oblivion someday)

          2) when i was working on Range Faceting (superset of DateFaceting) yonik pointed out that having the metadata mixed with the counts (like date faceting used) was a bad idea, and that we should really have a "counts" sub list for managing the actual counts, and keep the meta-data at the top level. with that in mind, i think what you have in your latest example looks great – i would just suggest that we rename the "pivot" key to "counts" for consistency, and then rename the "count" key to something else ("total" or "total-count" perhaps?)

          3) the one thing that still kind of bugs me about this components param structure is the way it takes in a comma seperated list of field names and then uses that comma seperated list as the "key" in the response. I'm wondering if a URL structure like this would be better...

          http://localhost:8983/solr/select?q=*:*&facet.pivot=my_name&facet.pivot.my_name=cat&facet.pivot.my_name=popularity&facet.pivot.my_name=inStock&facet=true

          ...where "my_name" then becomes the response key under the "facet_pivot" list?

          that way we don't add any more features that break if you have some special character in a field name

          Show
          Hoss Man added a comment - 1) ... Moves the parameter defines to FacetParams.java i'm not sure where they were in the original patch, but i really think they should be in their own PivotFacetParams class (just as having this be a distinct SearchComponent from the existing bloated FacetComponet is nice and keeps things manageable, having the params in a separate class is also nice ... i hope to get around to refactoring FacetComponent into oblivion someday) 2) when i was working on Range Faceting (superset of DateFaceting) yonik pointed out that having the metadata mixed with the counts (like date faceting used) was a bad idea, and that we should really have a "counts" sub list for managing the actual counts, and keep the meta-data at the top level. with that in mind, i think what you have in your latest example looks great – i would just suggest that we rename the "pivot" key to "counts" for consistency, and then rename the "count" key to something else ("total" or "total-count" perhaps?) 3) the one thing that still kind of bugs me about this components param structure is the way it takes in a comma seperated list of field names and then uses that comma seperated list as the "key" in the response. I'm wondering if a URL structure like this would be better... http://localhost:8983/solr/select?q=*:*&facet.pivot=my_name&facet.pivot.my_name=cat&facet.pivot.my_name=popularity&facet.pivot.my_name=inStock&facet=true ...where "my_name" then becomes the response key under the "facet_pivot" list? that way we don't add any more features that break if you have some special character in a field name
          Hide
          Ryan McKinley added a comment -

          I'm messing with a new implementation that allows deeper nesting. To get this to work, the output needs to be a bit more verbose. Consider:

          http://localhost:8983/solr/select?q=*:*&facet.pivot=cat,popularity,inStock&facet.pivot=popularity,cat&facet=true&facet.field=cat&facet.limit=5&rows=0&wt=json&indent=true

            "facet_pivot":{
                "cat,popularity,inStock":[{
                    "field":"cat",
                    "value":"electronics",
                    "count":17,
                    "pivot":[{
                        "field":"popularity",
                        "value":"6",
                        "count":5,
                        "pivot":[{
                            "field":"inStock",
                            "value":"true",
                            "count":5}]},
                      {
                        "field":"popularity",
                        "value":"7",
                        "count":5,
                        "pivot":[{
                            "field":"inStock",
                            "value":"true",
                            "count":3},
                          {
                            "field":"inStock",
                            "value":"false",
                            "count":2}]},
                      {
          ...
          

          This nested faceting will look great in:
          http://download.carrotsearch.com/circles/demo/

          Show
          Ryan McKinley added a comment - I'm messing with a new implementation that allows deeper nesting. To get this to work, the output needs to be a bit more verbose. Consider: http://localhost:8983/solr/select?q=*:*&facet.pivot=cat,popularity,inStock&facet.pivot=popularity,cat&facet=true&facet.field=cat&facet.limit=5&rows=0&wt=json&indent=true "facet_pivot" :{ "cat,popularity,inStock" :[{ "field" : "cat" , "value" : "electronics" , "count" :17, "pivot" :[{ "field" : "popularity" , "value" : "6" , "count" :5, "pivot" :[{ "field" : "inStock" , "value" : " true " , "count" :5}]}, { "field" : "popularity" , "value" : "7" , "count" :5, "pivot" :[{ "field" : "inStock" , "value" : " true " , "count" :3}, { "field" : "inStock" , "value" : " false " , "count" :2}]}, { ... This nested faceting will look great in: http://download.carrotsearch.com/circles/demo/
          Hide
          Ryan McKinley added a comment -

          This takes the existing patch and:

          1. Renames TreeFacetCompont to 'PivotFacetComponent' (i like that name best)
          2. Moves the parameter defines to FacetParams.java
          3. Adds pivot support to Solrj so the fields are used easily
          4. Adds tests using the solrj API
          5. Augments the results with the matching facet count – this is kind of hacky
            the name is is prefixed with its count, in the sample data, this is:
            <lst name="facet_pivot">
              <lst name="cat,popularity">
                <lst name="17:electronics">
                  <int name="6">5</int>
                  <int name="7">5</int>
                  <int name="5">3</int>
                   ...
                  </lst
                <lst name="6:memory">
                  <int name="5">3</int>
                  <int name="7">2</int>
                  ...
                </lst>
            

            This means that there are 17 things matching electronics and 6 matching memory.

          Anyone have better ideas how we could include this info? I also considered:

              <lst name="electronics">
                <int name="6">5</int>
                <int name="7">5</int>
                <int name="5">3</int>
                 ...
                <int name="__count__">17</int>
               </lst
          
          Show
          Ryan McKinley added a comment - This takes the existing patch and: Renames TreeFacetCompont to 'PivotFacetComponent' (i like that name best) Moves the parameter defines to FacetParams.java Adds pivot support to Solrj so the fields are used easily Adds tests using the solrj API Augments the results with the matching facet count – this is kind of hacky the name is is prefixed with its count, in the sample data, this is: <lst name= "facet_pivot" > <lst name= "cat,popularity" > <lst name= "17:electronics" > <int name= "6" > 5 </int> <int name= "7" > 5 </int> <int name= "5" > 3 </int> ... </lst <lst name= "6:memory" > <int name= "5" > 3 </int> <int name= "7" > 2 </int> ... </lst> This means that there are 17 things matching electronics and 6 matching memory. Anyone have better ideas how we could include this info? I also considered: <lst name= "electronics" > <int name= "6" > 5 </int> <int name= "7" > 5 </int> <int name= "5" > 3 </int> ... <int name= "__count__" > 17 </int> </lst
          Hide
          Jason Falk added a comment -

          facet.tree currently seems to bark at exclusion tags, I wouldn't mind trying to take a crack at this (as I currently do need it), but not really sure where to begin looking.

          Show
          Jason Falk added a comment - facet.tree currently seems to bark at exclusion tags, I wouldn't mind trying to take a crack at this (as I currently do need it), but not really sure where to begin looking.
          Hide
          David Smiley added a comment -

          Nevermind my opinion favoring this be titled "tree". I was confusing this issue with SOLR-64 thinking this was just a competing implementation of the concept, not something different. FWIW, I prefer "Pivot" vs "Cross tabulation".

          Show
          David Smiley added a comment - Nevermind my opinion favoring this be titled "tree". I was confusing this issue with SOLR-64 thinking this was just a competing implementation of the concept, not something different. FWIW, I prefer "Pivot" vs "Cross tabulation".
          Hide
          Hoss Man added a comment -

          FWIW: there was an excel feature i remember using back in the day that this faceting reminded me of as well .. and i finally remembered that it was called "Pivot Tables" that was later renamed "Cross Tabulation"

          Both of these terms are apparently meaningful outside of excel as generic data summarization concepts...

          http://en.wikipedia.org/wiki/Pivot_table
          http://en.wikipedia.org/wiki/Cross_tabulation

          My current vote would be to call this the "CrossTabulationFacetComponent" ...but i'm also ok with DecisionTreeFacetComponent if people prefer.

          Show
          Hoss Man added a comment - FWIW: there was an excel feature i remember using back in the day that this faceting reminded me of as well .. and i finally remembered that it was called "Pivot Tables" that was later renamed "Cross Tabulation" Both of these terms are apparently meaningful outside of excel as generic data summarization concepts... http://en.wikipedia.org/wiki/Pivot_table http://en.wikipedia.org/wiki/Cross_tabulation My current vote would be to call this the "CrossTabulationFacetComponent" ...but i'm also ok with DecisionTreeFacetComponent if people prefer.
          Hide
          Erik Hatcher added a comment -

          The reason I called it tree faceting was based on the same thing Hoss just said... decision tree. One decision leads to these other values, and so on.

          But how it was used in the case I built it for was to show a two dimensional display of the data.

          I'm not tied to any particular name. Would DecisionTreeFacetComponent be acceptable? Any volunteers to step up and rename and implement the N-level stuff?

          Show
          Erik Hatcher added a comment - The reason I called it tree faceting was based on the same thing Hoss just said... decision tree . One decision leads to these other values, and so on. But how it was used in the case I built it for was to show a two dimensional display of the data. I'm not tied to any particular name. Would DecisionTreeFacetComponent be acceptable? Any volunteers to step up and rename and implement the N-level stuff?
          Hide
          Hoss Man added a comment -

          i'm with grant ... i had an IRC conversation with erik a while back where i pointed out that there isn't anything intrinsicly "tree" oriented about this patch – it can be used in cases where you've got multiple fields expressing a tree structure, but it can also be useful for fields that are totally orthogincal to eachother (ie: the "cat" and "inStock" example)

          What this is essentially doing is treating each field as a vector containing the constraint counts for that field, and taking a "cross product" to produce an N-dimensional matrix showing the counts for each permutation. Which led to my suggestion for "matrix faceting" or "cross product faceting"

          The other way to look at it is that it's "Decision Tree Faceting" in that it tells you "for facet A, the constraints/counts are X/N, Y/M, etc.... if you were to constrain A by X, then the constraint counts for B would be S/P, T/Q, etc... ie: it tells you in advance what the "next" set of facet results would be for a field if you apply a constraint from the current facet results.

          but calling it just "Tree Faceting" seems missleading to me. that seems more applicable to something like SOLR-64

          Show
          Hoss Man added a comment - i'm with grant ... i had an IRC conversation with erik a while back where i pointed out that there isn't anything intrinsicly "tree" oriented about this patch – it can be used in cases where you've got multiple fields expressing a tree structure, but it can also be useful for fields that are totally orthogincal to eachother (ie: the "cat" and "inStock" example) What this is essentially doing is treating each field as a vector containing the constraint counts for that field, and taking a "cross product" to produce an N-dimensional matrix showing the counts for each permutation. Which led to my suggestion for "matrix faceting" or "cross product faceting" The other way to look at it is that it's "Decision Tree Faceting" in that it tells you "for facet A, the constraints/counts are X/N, Y/M, etc.... if you were to constrain A by X, then the constraint counts for B would be S/P, T/Q, etc... ie: it tells you in advance what the "next" set of facet results would be for a field if you apply a constraint from the current facet results. but calling it just "Tree Faceting" seems missleading to me. that seems more applicable to something like SOLR-64
          Hide
          Grant Ingersoll added a comment -

          I don't know, tree implies something different (a taxonomy) to me in that you need tools and capabilities for managing that are also a part of Solr, whereas this is grouping items together at just level 2 is useful in it's own right and doesn't require that other stuff to make it manageable. Just my two cents.

          Show
          Grant Ingersoll added a comment - I don't know, tree implies something different (a taxonomy) to me in that you need tools and capabilities for managing that are also a part of Solr, whereas this is grouping items together at just level 2 is useful in it's own right and doesn't require that other stuff to make it manageable. Just my two cents.
          Hide
          David Smiley added a comment -

          I like "nested faceting" better than "grid faceting" but I prefer "tree faceting" most of all.

          I think N-levels is key.

          Show
          David Smiley added a comment - I like "nested faceting" better than "grid faceting" but I prefer "tree faceting" most of all. I think N-levels is key.
          Hide
          Erik Hatcher added a comment -

          How about "nested faceting"? "grid faceting"?

          Should be generalized for N levels of nested before committing? Or are folks ok with it being limited to two levels at first and generalize it more after committing?

          Show
          Erik Hatcher added a comment - How about "nested faceting"? "grid faceting"? Should be generalized for N levels of nested before committing? Or are folks ok with it being limited to two levels at first and generalize it more after committing?
          Hide
          Grant Ingersoll added a comment -

          From left field, the name "Grouped Faceting" or some variation of it comes to mind when I think of this functionality. Would be nice to get this committed at some point.

          Show
          Grant Ingersoll added a comment - From left field, the name "Grouped Faceting" or some variation of it comes to mind when I think of this functionality. Would be nice to get this committed at some point.
          Hide
          Erik Hatcher added a comment -

          Updated patch to trunk, very minor cosmetic differences.

          Show
          Erik Hatcher added a comment - Updated patch to trunk, very minor cosmetic differences.
          Hide
          SolrFan added a comment -

          Hi, can this patch please be updated against the current 1.4 trunk? thanks.

          Show
          SolrFan added a comment - Hi, can this patch please be updated against the current 1.4 trunk? thanks.
          Hide
          Thibaut Lassalle added a comment -

          Update to apply cleanly against release 1.4

          Show
          Thibaut Lassalle added a comment - Update to apply cleanly against release 1.4
          Hide
          Jeremy Hinegardner added a comment -

          Update to apply cleanly against trunk.

          Show
          Jeremy Hinegardner added a comment - Update to apply cleanly against trunk.
          Hide
          Erik Hatcher added a comment -

          See http://wiki.apache.org/solr/HierarchicalFaceting for stats on this approach and comparing/contrasting to SOLR-64

          Show
          Erik Hatcher added a comment - See http://wiki.apache.org/solr/HierarchicalFaceting for stats on this approach and comparing/contrasting to SOLR-64
          Hide
          Jeremy Hinegardner added a comment - - edited

          I've attempted to update SOLR-792 to work distributed. This works with my test setup.
          Is adding a new field to ResponseBuilder the proper way to implement this? Thats what FacetComponent does, so I followed its example.

          To apply to HEAD, use

          patch -p1 < SOLR-792.patch
          

          I also changed to SimpleOrderedMap in a few places where it was NamedList. It seemed more appropriate.

          Show
          Jeremy Hinegardner added a comment - - edited I've attempted to update SOLR-792 to work distributed. This works with my test setup. Is adding a new field to ResponseBuilder the proper way to implement this? Thats what FacetComponent does, so I followed its example. To apply to HEAD, use patch -p1 < SOLR-792.patch I also changed to SimpleOrderedMap in a few places where it was NamedList. It seemed more appropriate.
          Hide
          Erik Hatcher added a comment -

          fixes inner TermQuery to use actual internal indexed value

          Show
          Erik Hatcher added a comment - fixes inner TermQuery to use actual internal indexed value
          Hide
          Hoss Man added a comment -

          SOLR-64 is (in theory) about better faceting support for fields that represent a hierarchy.

          What Erik is addressing seems to me more like generating an "N- dimensional matrix" of facet counts

          Show
          Hoss Man added a comment - SOLR-64 is (in theory) about better faceting support for fields that represent a hierarchy. What Erik is addressing seems to me more like generating an "N- dimensional matrix" of facet counts
          Hide
          Shalin Shekhar Mangar added a comment -

          Is it this related to (or same as) SOLR-64 ?

          Show
          Shalin Shekhar Mangar added a comment - Is it this related to (or same as) SOLR-64 ?
          Hide
          Erik Hatcher added a comment - - edited

          This patch is a simple implementation to do a fixed two-level faceting, using the SimpleFacets functions. This is just the start. The idea is to make the actual faceting implementation pluggable, support arbitrary levels, perhaps also support nested facet queries, not just facet fields.

          With this patch, this query, on Solr's example data set, returns the data below:

          http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=cat&facet.tree=cat,inStock&wt=json&indent=on

           "facet_counts":{
            "facet_queries":{},
            "facet_fields":{
          	"cat":[
          	 "electronics",14,
          	 "memory",3,
          	 "card",2,
          	 "connector",2,
          	 "drive",2,
          	 "graphics",2,
          	 "hard",2,
          	 "monitor",2,
          	 "search",2,
          	 "software",2,
          	 "camera",1,
          	 "copier",1,
          	 "multifunction",1,
          	 "music",1,
          	 "printer",1,
          	 "scanner",1]},
            "facet_dates":{},
            "trees":[
          	"cat,inStock",[
          	 "electronics",[
          	  "true",10,
          	  "false",4],
          	 "memory",[
          	  "true",3,
          	  "false",0],
          	 "card",[
          	  "false",2,
          	  "true",0],
          	 "connector",[
          	  "false",2,
          	  "true",0],
          	 "drive",[
          	  "true",2,
          	  "false",0],
          	 "graphics",[
          	  "false",2,
          	  "true",0],
          	 "hard",[
          	  "true",2,
          	  "false",0],
          	 "monitor",[
          	  "true",2,
          	  "false",0],
          	 "search",[
          	  "true",2,
          	  "false",0],
          	 "software",[
          	  "true",2,
          	  "false",0],
          	 "camera",[
          	  "true",1,
          	  "false",0],
          	 "copier",[
          	  "true",1,
          	  "false",0],
          	 "multifunction",[
          	  "true",1,
          	  "false",0],
          	 "music",[
          	  "true",1,
          	  "false",0],
          	 "printer",[
          	  "true",1,
          	  "false",0],
          	 "scanner",[
          	  "true",1,
          	  "false",0]]]}}
          
          Show
          Erik Hatcher added a comment - - edited This patch is a simple implementation to do a fixed two-level faceting, using the SimpleFacets functions. This is just the start. The idea is to make the actual faceting implementation pluggable, support arbitrary levels, perhaps also support nested facet queries, not just facet fields. With this patch, this query, on Solr's example data set, returns the data below: http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=cat&facet.tree=cat,inStock&wt=json&indent=on "facet_counts" :{ "facet_queries" :{}, "facet_fields" :{ "cat" :[ "electronics" ,14, "memory" ,3, "card" ,2, "connector" ,2, "drive" ,2, "graphics" ,2, "hard" ,2, "monitor" ,2, "search" ,2, "software" ,2, "camera" ,1, "copier" ,1, "multifunction" ,1, "music" ,1, "printer" ,1, "scanner" ,1]}, "facet_dates" :{}, "trees" :[ "cat,inStock" ,[ "electronics" ,[ " true " ,10, " false " ,4], "memory" ,[ " true " ,3, " false " ,0], "card" ,[ " false " ,2, " true " ,0], "connector" ,[ " false " ,2, " true " ,0], "drive" ,[ " true " ,2, " false " ,0], "graphics" ,[ " false " ,2, " true " ,0], "hard" ,[ " true " ,2, " false " ,0], "monitor" ,[ " true " ,2, " false " ,0], "search" ,[ " true " ,2, " false " ,0], "software" ,[ " true " ,2, " false " ,0], "camera" ,[ " true " ,1, " false " ,0], "copier" ,[ " true " ,1, " false " ,0], "multifunction" ,[ " true " ,1, " false " ,0], "music" ,[ " true " ,1, " false " ,0], "printer" ,[ " true " ,1, " false " ,0], "scanner" ,[ " true " ,1, " false " ,0]]]}}

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Erik Hatcher
            • Votes:
              13 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development