Solr
  1. Solr
  2. SOLR-247

Allow facet.field=* to facet on all fields (without knowing what they are)

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      I don't know if this is a good idea to include – it is potentially a bad idea to use it, but that can be ok.

      This came out of trying to use faceting for the LukeRequestHandler top term collecting.
      http://www.nabble.com/Luke-request-handler-issue-tf3762155.html

      1. SOLR-247-FacetAllFields.patch
        1.0 kB
        Ryan McKinley
      2. SOLR-247.patch
        31 kB
        Lars Kotthoff
      3. SOLR-247.patch
        3 kB
        Lars Kotthoff
      4. SOLR-247.patch
        3 kB
        Lars Kotthoff

        Issue Links

          Activity

          Hide
          Erik Hatcher added a comment -

          I can see value in supporting the dynamicField wildcard syntax, so *_facet would work. In fact, maybe that'd be a good syntax to support for all fl-like parameters too.

          • scares me, and it'd certainly be discouraged for anything but small indexes! But of course I don't have to use it.
          Show
          Erik Hatcher added a comment - I can see value in supporting the dynamicField wildcard syntax, so *_facet would work. In fact, maybe that'd be a good syntax to support for all fl-like parameters too. scares me, and it'd certainly be discouraged for anything but small indexes! But of course I don't have to use it.
          Hide
          Hoss Man added a comment - - edited

          I have a really hard time imagining anything but the most trivial use cases for facet.field=* ... it doesn't really sime like a problem in need of a solution.

          with somehting like fl=*, we're only talking about stored fields ... storing a field makes no sense unless you plan on returning it in the field list some of the time, so fl=* makes sense as a "return all of hte fields that are possible to return" option.

          There are lots of reasons why a field might be indexed though, so faceting on every indexed field doesn't seem like it would ever make sense.

          in my opinion a "best practice" is not to use fl=* unless you are debugging anyway, otherwise you find yourself getting slammed with large amounts of data you don't want as the index evolves over time ... something like facet.field=* would be worse because it's not just the amount of data getting returned that would increase, but the amount of computation (and time and poor cache performance) that would spike as well.

          if we do this, i would think it only makes sense to generalize the use of "*" in both fl and facet.field into a true glob style syntax, so we can at least encourage people who want this type of syntax to use a naming convention to help limit how much they hurt themselves.

          (i have no problem giving people enough rope to hang themselves, but we shouldn't tie a noose in the rope before we give it to them)

          Show
          Hoss Man added a comment - - edited I have a really hard time imagining anything but the most trivial use cases for facet.field=* ... it doesn't really sime like a problem in need of a solution. with somehting like fl=* , we're only talking about stored fields ... storing a field makes no sense unless you plan on returning it in the field list some of the time, so fl=* makes sense as a "return all of hte fields that are possible to return" option. There are lots of reasons why a field might be indexed though, so faceting on every indexed field doesn't seem like it would ever make sense. in my opinion a "best practice" is not to use fl=* unless you are debugging anyway, otherwise you find yourself getting slammed with large amounts of data you don't want as the index evolves over time ... something like facet.field=* would be worse because it's not just the amount of data getting returned that would increase, but the amount of computation (and time and poor cache performance) that would spike as well. if we do this, i would think it only makes sense to generalize the use of "*" in both fl and facet.field into a true glob style syntax, so we can at least encourage people who want this type of syntax to use a naming convention to help limit how much they hurt themselves. (i have no problem giving people enough rope to hang themselves, but we shouldn't tie a noose in the rope before we give it to them)
          Hide
          Ryan McKinley added a comment -

          >
          > There are lots of reasons why a field might be indexed though, so faceting on every indexed field doesn't seem like it would ever make sense.
          >

          agreed, but *_facet would be useful

          >
          > if we do this, i would think it only makes sense to generalize the use of "*" in both fl and facet.field into a true glob style syntax

          One issue is that fl=XXX is typically a field list separated with "," or "|", facet.field expects each field as a separate parameter.

          Show
          Ryan McKinley added a comment - > > There are lots of reasons why a field might be indexed though, so faceting on every indexed field doesn't seem like it would ever make sense. > agreed, but *_facet would be useful > > if we do this, i would think it only makes sense to generalize the use of "*" in both fl and facet.field into a true glob style syntax One issue is that fl=XXX is typically a field list separated with "," or "|", facet.field expects each field as a separate parameter.
          Hide
          Hoss Man added a comment -

          see some follow up comments in the mailing lists...

          http://www.nabble.com/forum/Search.jtp?forum=14479&local=y&query=SOLR-247

          in a nut shell, i think this issue can be resolved won't fix ... but i'm not opposed to leaving open if someone wants to work on it. there are ways for people to configure solr so that all the fields they want to facet on are faceted on by defualt (when configuring the requestHanlder) which is safer then wild carding.

          Show
          Hoss Man added a comment - see some follow up comments in the mailing lists... http://www.nabble.com/forum/Search.jtp?forum=14479&local=y&query=SOLR-247 in a nut shell, i think this issue can be resolved won't fix ... but i'm not opposed to leaving open if someone wants to work on it. there are ways for people to configure solr so that all the fields they want to facet on are faceted on by defualt (when configuring the requestHanlder) which is safer then wild carding.
          Hide
          Pieter Berkel added a comment -

          Some recent discussion on this topic:

          http://www.nabble.com/Structured-Lucene-documents-tf4234661.html

          I get the impression that general wildcard syntax support for field listing parameters (i.e. the reverse of dynamic fields) as described in the above thread would be far more useful than a simple '*' match-anything syntax (not only in faceting but other cases like hl.fl and perhaps even mlt.fl).

          I haven't really considered the performance issues of this approach however, as it would involve checking each field supplied in the parameter for '*' before expanding it into full field names for every query.

          Given the above, the fact that it could be used across multiple response handlers and subhandlers like SimpleFacets & Highlighting, and that it would require access to IndexReader to getFieldNames(), where might be the most sensible place to put this code?

          Show
          Pieter Berkel added a comment - Some recent discussion on this topic: http://www.nabble.com/Structured-Lucene-documents-tf4234661.html I get the impression that general wildcard syntax support for field listing parameters (i.e. the reverse of dynamic fields) as described in the above thread would be far more useful than a simple '*' match-anything syntax (not only in faceting but other cases like hl.fl and perhaps even mlt.fl). I haven't really considered the performance issues of this approach however, as it would involve checking each field supplied in the parameter for '*' before expanding it into full field names for every query. Given the above, the fact that it could be used across multiple response handlers and subhandlers like SimpleFacets & Highlighting, and that it would require access to IndexReader to getFieldNames(), where might be the most sensible place to put this code?
          Hide
          Matthew Runo added a comment - - edited

          http://www.nabble.com/Dynamic-fields---Facets-to14739422.html

          also provides a use case for this to be fixed. While I'd never do a facet on the wildcard, I'd love to be able to do one on attribute_<wildcard>. It just makes using the dynamic fields so much easier.

          Show
          Matthew Runo added a comment - - edited http://www.nabble.com/Dynamic-fields---Facets-to14739422.html also provides a use case for this to be fixed. While I'd never do a facet on the wildcard, I'd love to be able to do one on attribute_<wildcard>. It just makes using the dynamic fields so much easier.
          Hide
          Hoss Man added a comment -

          i've put soem thoughts on the broader issues of having solr admin control over how field names are dealt with (globs, regexes, aliasing, etc...) in various contexts on the wiki...

          http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams

          ...it might be best to use that as a whiteboard for a design discussion since the ultimate issues are a little bigger then this issue originally set out to tackle.

          Show
          Hoss Man added a comment - i've put soem thoughts on the broader issues of having solr admin control over how field names are dealt with (globs, regexes, aliasing, etc...) in various contexts on the wiki... http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams ...it might be best to use that as a whiteboard for a design discussion since the ultimate issues are a little bigger then this issue originally set out to tackle.
          Hide
          Lars Kotthoff added a comment -

          Attaching patch which implements support for wildcards in facet field specifications similar to SOLR-540. If the facet field specification contains an asterisk, every indexed field the reader knows about is matched against the corresponding regular expression.

          Note that the unit tests part of the patch sort of depends on SOLR-645. When applied to the current trunk it will create the new facets test file with all the old tests plus the new ones. This doesn't cause anything to not work anymore, but duplicates the old tests. I can provide a new patch either against the current trunk or against the trunk with SOLR-645 committed, whichever is required.

          Show
          Lars Kotthoff added a comment - Attaching patch which implements support for wildcards in facet field specifications similar to SOLR-540 . If the facet field specification contains an asterisk, every indexed field the reader knows about is matched against the corresponding regular expression. Note that the unit tests part of the patch sort of depends on SOLR-645 . When applied to the current trunk it will create the new facets test file with all the old tests plus the new ones. This doesn't cause anything to not work anymore, but duplicates the old tests. I can provide a new patch either against the current trunk or against the trunk with SOLR-645 committed, whichever is required.
          Hide
          Lars Kotthoff added a comment -

          Attaching new patch which applies to current TRUNK.

          Show
          Lars Kotthoff added a comment - Attaching new patch which applies to current TRUNK.
          Hide
          Lars Kotthoff added a comment -

          Syncing patch with trunk.

          Show
          Lars Kotthoff added a comment - Syncing patch with trunk.
          Hide
          Shalin Shekhar Mangar added a comment -

          Lars, I see you have been updating the patches to trunk diligently. However, I'm not sure if there is a consensus on adding this without having a glob like feature in place.

          Do you have a use-case in mind which can be solved only with the current patch?

          Show
          Shalin Shekhar Mangar added a comment - Lars, I see you have been updating the patches to trunk diligently. However, I'm not sure if there is a consensus on adding this without having a glob like feature in place. Do you have a use-case in mind which can be solved only with the current patch?
          Hide
          Lars Kotthoff added a comment -

          Off the top of my head, having an automated feed parser which adds fields and facet_field to facet on. I agree that all this should be part of a global glob-like thing, but that would probably only apply to the part which parses the parameters anyway. How a glob is matched depends on the type of glob (i.e. whether the field is indexed/stored/... and we want to facet/highlight/...).

          If people start using it and it turns out to be important, it can always be refactored into something more general. If nobody uses globbing, there'd be no need to invest the effort of making it general

          Show
          Lars Kotthoff added a comment - Off the top of my head, having an automated feed parser which adds fields and facet_field to facet on. I agree that all this should be part of a global glob-like thing, but that would probably only apply to the part which parses the parameters anyway. How a glob is matched depends on the type of glob (i.e. whether the field is indexed/stored/... and we want to facet/highlight/...). If people start using it and it turns out to be important, it can always be refactored into something more general. If nobody uses globbing, there'd be no need to invest the effort of making it general
          Hide
          Avlesh Singh added a comment -

          I haven't tested this patch yet. But my belief is that the primary objective should be to support dynamic fields than pure wildcard field names. Dynamic fields offer wide range of capabilities with w.r.t key-value(s) kind of data. Most of the times people use such fields because the keys are not known upfront.

          If nothing more, this patch should at least cater to that audience.

          Show
          Avlesh Singh added a comment - I haven't tested this patch yet. But my belief is that the primary objective should be to support dynamic fields than pure wildcard field names. Dynamic fields offer wide range of capabilities with w.r.t key-value(s) kind of data. Most of the times people use such fields because the keys are not known upfront. If nothing more, this patch should at least cater to that audience.
          Hide
          Jan Høydahl added a comment -

          Seems like there has not been much demand for this the last 4 years Could this not be a nice task to do at the same time as SOLR-650 ?

          SPRING_CLEANING_2013

          Show
          Jan Høydahl added a comment - Seems like there has not been much demand for this the last 4 years Could this not be a nice task to do at the same time as SOLR-650 ? SPRING_CLEANING_2013
          Hide
          Erick Erickson added a comment -

          My first reaction to this is that while it might have some limited use-cases with small indexes, as soon as one went to any decent size corpus it'd blow memory up. Not sure it's worth the effort, but I could be convinced otherwise...

          SOLR-650 seems something of a separate issue, it's much more controlled. That said, they're both really about now to specify the list of fields for faceting, so you're right in that they're part of the same concept....

          Show
          Erick Erickson added a comment - My first reaction to this is that while it might have some limited use-cases with small indexes, as soon as one went to any decent size corpus it'd blow memory up. Not sure it's worth the effort, but I could be convinced otherwise... SOLR-650 seems something of a separate issue, it's much more controlled. That said, they're both really about now to specify the list of fields for faceting, so you're right in that they're part of the same concept....
          Hide
          Jan Høydahl added a comment -

          I agree it's a terrible idea for anything production, but for discovery it could be nice. I often throw "unknown" data into an index with a catch-all <dynamicField name="*" type="string"/> kind of config, and then find myself specifying a lot of facet.field's to introspect what's in the various fields. For pure dev purposes it'd be a nice shortcut. So for me it can live as a newdev issue for still some time...

          Show
          Jan Høydahl added a comment - I agree it's a terrible idea for anything production, but for discovery it could be nice. I often throw "unknown" data into an index with a catch-all <dynamicField name="*" type="string"/> kind of config, and then find myself specifying a lot of facet.field's to introspect what's in the various fields. For pure dev purposes it'd be a nice shortcut. So for me it can live as a newdev issue for still some time...
          Hide
          Gowtham Gutha added a comment -

          Why doesn't it accept wildcards. So, that when creating the schema.xml, I will be including the faceted fields with a suffix to identify them as facet fields.

          This would be great and even can be fixed.

          http://localhost:8983/solr/select?q=ipod&rows=0&facet=true&facet.limit=-1&facet.field=*_facet&facet.mincount=1

          Show
          Gowtham Gutha added a comment - Why doesn't it accept wildcards. So, that when creating the schema.xml , I will be including the faceted fields with a suffix to identify them as facet fields. This would be great and even can be fixed. http://localhost:8983/solr/select?q=ipod&rows=0&facet=true&facet.limit=-1&facet.field=*_facet&facet.mincount=1
          Hide
          Jack Krupansky added a comment -

          The earlier commentary clearly lays out that the primary concern is that it would be a performance nightmare, but... that does depend on your particular use case.

          Personally, I would say to go forward with adding this feature, but with a clear documentation caveat that this feature "should be use with great care since it is likely to be extremely memory and performance intensive and more of a development testing tool than a production feature, although it could have value when wildcard patterns are crafted with care for a very limited number of fields."

          Show
          Jack Krupansky added a comment - The earlier commentary clearly lays out that the primary concern is that it would be a performance nightmare, but... that does depend on your particular use case. Personally, I would say to go forward with adding this feature, but with a clear documentation caveat that this feature "should be use with great care since it is likely to be extremely memory and performance intensive and more of a development testing tool than a production feature, although it could have value when wildcard patterns are crafted with care for a very limited number of fields."

            People

            • Assignee:
              Unassigned
              Reporter:
              Ryan McKinley
            • Votes:
              10 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:

                Development