Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: search
    • Labels:
      None

      Description

      First pass at basic facet support. initial patch includes utilities for use in RequestHandlers, and usage in StandardRequestHandler (DisMax should use SolrParams before attempting to add this)

      Basic idea is that:

      • facet=true indicates facet counts are desired.
      • facetField=inStock indicates we want a count of the matching docs for each value in the field inStock
      • facetQuery=title:ipod indicates we want the count of matching docs also in the set of docs matching query title:ipod
      • if user wants to apply a facet constraint on subsequent queries, they can add an "fq" (filter query) param (support for this was added to StandardRequestHandler as well)

      Things marked TODO...

      • add support for per field facetLimit indicating that only the top N items in each facetField should be returned
      • add support for a per field facetZero boolean indicating that there is no reason to bother returning counts of 0 for facetFields (some clients may want to know the list, others don't care)
      • potential optimization when using faceLimit to cache the terms with the highest docFreq and see if they provide all the info we need without doing a full TermEnum

      I'd like to get some feedback on the overall appraoch and params before i proceed too much farther.

      1. ASF.LICENSE.NOT.GRANTED--simple-facets.patch
        13 kB
        Hoss Man
      2. simple-facets.patch
        24 kB
        Hoss Man
      3. simple-facets.patch
        33 kB
        Hoss Man
      4. simple-facets.patch
        33 kB
        Hoss Man

        Activity

        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hoss Man made changes -
        Fix Version/s 1.1.0 [ 12312234 ]
        Hide
        Hoss Man added a comment -

        This bug was modified as part of a bulk update using the criteria...

        • Marked ("Resolved" or "Closed") and "Fixed"
        • Had no "Fix Version" versions
        • Was listed in the CHANGES.txt for 1.1

        The Fix Version for all 38 issues found was set to 1.1, email notification
        was suppressed to prevent excessive email.

        For a list of all the issues modified, search jira comments for this
        (hopefully) unique string: 20080415hossman3

        Show
        Hoss Man added a comment - This bug was modified as part of a bulk update using the criteria... Marked ("Resolved" or "Closed") and "Fixed" Had no "Fix Version" versions Was listed in the CHANGES.txt for 1.1 The Fix Version for all 38 issues found was set to 1.1, email notification was suppressed to prevent excessive email. For a list of all the issues modified, search jira comments for this (hopefully) unique string: 20080415hossman3
        Hoss Man made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Hoss Man added a comment -

        whoops ... i left this open to remind me to work on the wiki pages, which i did without remembering to resolve this.

        Show
        Hoss Man added a comment - whoops ... i left this open to remind me to work on the wiki pages, which i did without remembering to resolve this.
        Hide
        Hoss Man added a comment -

        patch commited ... i'll update work on some better wiki pages this afternoon.

        Show
        Hoss Man added a comment - patch commited ... i'll update work on some better wiki pages this afternoon.
        Hide
        Yonik Seeley added a comment -

        I've tried it out... seems to be working great!

        Show
        Yonik Seeley added a comment - I've tried it out... seems to be working great!
        Hoss Man made changes -
        Attachment simple-facets.patch [ 12340313 ]
        Hide
        Hoss Man added a comment -

        Small change to the way the fq params are processed...

        previously it was valid to have an "fq" param of "" that would be ignored by the DisMax handler (if it did't ignore it, you got a parse error from QueryParser) ... i readded that logic to the utility that now deals with the FQ params for both standard and dismax.

        Show
        Hoss Man added a comment - Small change to the way the fq params are processed... previously it was valid to have an "fq" param of "" that would be ignored by the DisMax handler (if it did't ignore it, you got a parse error from QueryParser) ... i readded that logic to the utility that now deals with the FQ params for both standard and dismax.
        Hoss Man made changes -
        Attachment simple-facets.patch [ 12340309 ]
        Hide
        Hoss Man added a comment -

        revised patch inlcudes facet support for the DisMax handler, also changed DisMax handler's use of "fq" to be multivalue (as standard request handler is now)

        I'll probably commit this version sometime in the next 24 hours unless any one spots any problems.

        Show
        Hoss Man added a comment - revised patch inlcudes facet support for the DisMax handler, also changed DisMax handler's use of "fq" to be multivalue (as standard request handler is now) I'll probably commit this version sometime in the next 24 hours unless any one spots any problems.
        Hoss Man made changes -
        Field Original Value New Value
        Attachment simple-facets.patch [ 12340119 ]
        Hide
        Hoss Man added a comment -

        Per more mailing list discusion, a new version of the patch...

        1) param names have changed to match conventions disccuessed about highlighting...

        facet - boolean - do facet counts or not
        facet.query - multival string - list of arbitrary query constraints to count
        facet.field - multival string - list of fields to treat as facets
        facet.missing - boolean, per field - count docs that have no value for field
        facet.zeros - boolean, per field - include facet field values with 0 counts
        facet.limit - int, per field - max number of field values to return (desc)

        2) note that previuosly TODO "limit" and "zeros" functionality has been added

        3) note addition of "missing" option per discussion on the list

        4) response format has been modified slightly (not as extensively as discussed on the list since there wasn't a clear concensus about a good API - but a cleaner seperation of query based facets and facet fields)

        5) heavy refactoring: all functionality put into a "SimpleFacets" class which can be subclasseed/composed to get individual pieces of functionality. usage of this class by StandardRequestHandler also refactored into method that subclassing handlers can override (to add complex facet behavior without giving up other built in goodness of StandardRequestHandler)

        A cool example that demonstrates everything with the example schema/docs...

        http://localhost:8983/solr/select/?facet.query=price:[400+TO+*]&facet.query=price:[*+TO+400]&q=video&start=0&rows=0&f.cat.facet.limit=8&facet.zeros=false&f.cat.facet.zeros=true&facet=true&facet.field=inStock&facet.field=cat&f.includes.facet.missing=true&facet.field=includes

        (NOTE: this is all still just used in StandardRequestHandler - not DisMax yet)

        Show
        Hoss Man added a comment - Per more mailing list discusion, a new version of the patch... 1) param names have changed to match conventions disccuessed about highlighting... facet - boolean - do facet counts or not facet.query - multival string - list of arbitrary query constraints to count facet.field - multival string - list of fields to treat as facets facet.missing - boolean, per field - count docs that have no value for field facet.zeros - boolean, per field - include facet field values with 0 counts facet.limit - int, per field - max number of field values to return (desc) 2) note that previuosly TODO "limit" and "zeros" functionality has been added 3) note addition of "missing" option per discussion on the list 4) response format has been modified slightly (not as extensively as discussed on the list since there wasn't a clear concensus about a good API - but a cleaner seperation of query based facets and facet fields) 5) heavy refactoring: all functionality put into a "SimpleFacets" class which can be subclasseed/composed to get individual pieces of functionality. usage of this class by StandardRequestHandler also refactored into method that subclassing handlers can override (to add complex facet behavior without giving up other built in goodness of StandardRequestHandler) A cool example that demonstrates everything with the example schema/docs... http://localhost:8983/solr/select/?facet.query=price:[400+TO+*]&facet.query=price:[*+TO+400]&q=video&start=0&rows=0&f.cat.facet.limit=8&facet.zeros=false&f.cat.facet.zeros=true&facet=true&facet.field=inStock&facet.field=cat&f.includes.facet.missing=true&facet.field=includes (NOTE: this is all still just used in StandardRequestHandler - not DisMax yet)
        Hide
        Hoss Man added a comment -

        Per mailing list discussion...
        1) Mike's points about parameter names are dead on, and i'll be making changes.
        2) Yonik pointed out I wasn't very forthcoming with examples, my bad.

        With the patch as it stands right now, a query like this (against the example schema/docs) ...

        http://localhost:8983/solr/select/?q=video&facetQuery=inStock:true&facetQuery=price:[*+TO+500]&facet=true

        ...would match on 3 docs, and would contain the following additional data...

        <lst name="facet_counts">
        <int name="inStock:true">1</int>
        <int name="price:[* TO 500]">2</int>
        </lst>

        The real powerful stuff comes into play when using facetField ...

        http://localhost:8983/solr/select/?indent=1&q=video&facetField=inStock&facetField=cat&facetQuery=price:[*+TO+500]&facet=true

        ...to get...

        <lst name="facet_counts">
        <int name="price:[* TO 500]">2</int>
        <lst name="inStock">
        <int name="true">1</int>

        <int name="false">2</int>
        </lst>
        <lst name="cat">
        <int name="search">0</int>
        <int name="memory">0</int>
        <int name="graphics">2</int>
        <int name="card">2</int>

        <int name="connector">0</int>
        <int name="software">0</int>
        <int name="electronics">3</int>
        <int name="copier">0</int>
        <int name="multifunction">0</int>
        <int name="camera">0</int>

        <int name="music">1</int>
        <int name="hard">0</int>
        <int name="scanner">0</int>
        <int name="monitor">0</int>
        <int name="drive">0</int>
        <int name="printer">0</int>

        </lst>
        </lst>

        Show
        Hoss Man added a comment - Per mailing list discussion... 1) Mike's points about parameter names are dead on, and i'll be making changes. 2) Yonik pointed out I wasn't very forthcoming with examples, my bad. With the patch as it stands right now, a query like this (against the example schema/docs) ... http://localhost:8983/solr/select/?q=video&facetQuery=inStock:true&facetQuery=price:[*+TO+500]&facet=true ...would match on 3 docs, and would contain the following additional data... <lst name="facet_counts"> <int name="inStock:true">1</int> <int name="price: [* TO 500] ">2</int> </lst> The real powerful stuff comes into play when using facetField ... http://localhost:8983/solr/select/?indent=1&q=video&facetField=inStock&facetField=cat&facetQuery=price:[*+TO+500]&facet=true ...to get... <lst name="facet_counts"> <int name="price: [* TO 500] ">2</int> <lst name="inStock"> <int name="true">1</int> <int name="false">2</int> </lst> <lst name="cat"> <int name="search">0</int> <int name="memory">0</int> <int name="graphics">2</int> <int name="card">2</int> <int name="connector">0</int> <int name="software">0</int> <int name="electronics">3</int> <int name="copier">0</int> <int name="multifunction">0</int> <int name="camera">0</int> <int name="music">1</int> <int name="hard">0</int> <int name="scanner">0</int> <int name="monitor">0</int> <int name="drive">0</int> <int name="printer">0</int> </lst> </lst>
        Hide
        Mike Klaas added a comment -

        I haven't looked at the patch yet but in terms of the parameters, might it make sense to use a group name similar to the highlighter params? e.g., facet, facet.fl, facet.query, facet.limit, etc.

        Also, now that we have per-field override capability for params, we should document which params can be thus overridden (facet.zero, facet.limit?)

        Show
        Mike Klaas added a comment - I haven't looked at the patch yet but in terms of the parameters, might it make sense to use a group name similar to the highlighter params? e.g., facet, facet.fl, facet.query, facet.limit, etc. Also, now that we have per-field override capability for params, we should document which params can be thus overridden (facet.zero, facet.limit?)
        Hoss Man created issue -

          People

          • Assignee:
            Hoss Man
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development