Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5972

new statistics facet capabilities to StatsComponent facet - limit, sort and missing.

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I thought it would be very useful to enable limiting and sorting StatsComponent facet response.
      I chose to implement it in Stats Component rather than Analytics component because Analytics doesn't support distributed queries yet.

      The default for limit is -1 - returns all facet values.
      The default for sort is no sorting.
      The default for missing is true.
      So if you use stats component exactly as before, the response won't change as of nowadays.
      If ask for sort or limit, missing facet value will be the last, as in regular facet.
      Sort types supported: min, max, sum and countdistinct for stats fields, and count and index for facet fields (all sort types are lower cased).
      Sort directions asc and desc are supported.
      Sorting by multiple fields is supported.

      our example use case will be employees' monthly salaries:

      The follwing query returns the 10 most "expensive" employees:
      "q=:&stats=true&stats.field=salary&stats.facet=employee_name&f.employee_name.stats.facet.sort=salary sum desc&f.employee_name.stats.facet.limit=10"
      The follwing query returns the 10 least "expensive" employees:
      "q=:&stats=true&stats.field=salary&stats.facet=employee_name&f.employee_name.stats.facet.sort=salary sum asc&f.employee_name.stats.facet.limit=10"
      The follwing query returns the employee that got the highest salary ever:
      "q=:&stats=true&stats.field=salary&stats.facet=employee_name&f.employee_name.stats.facet.sort=salary max desc&f.employee_name.stats.facet.limit=1"
      The follwing query returns the employee that got the lowest salary ever:
      "q=:&stats=true&stats.field=salary&stats.facet=employee_name&f.employee_name.stats.facet.sort=salary min asc&f.employee_name.stats.facet.limit=1"
      The follwing query returns the 10 first (lexicographically) employees:
      "q=:&stats=true&stats.field=salary&stats.facet=employee_name&f.employee_name.stats.facet.sort=employee_name index asc&f.employee_name.stats.facet.limit=10"
      The follwing query returns the 10 employees that have worked for the longest period:
      "q=:&stats=true&stats.field=salary&stats.facet=employee_name&f.employee_name.stats.facet.sort=employee_name count desc&f.employee_name.stats.facet.limit=10"
      The follwing query returns the 10 employee whose salaries vary the most:
      "q=:&stats=true&stats.field=salary&stats.facet=employee_name&f.employee_name.stats.facet.sort=salary countdistinct desc&f.employee_name.stats.facet.limit=10"

      Attached a patch implementing this in StatsComponent.

        Attachments

        1. SOLR-5972.patch
          40 kB
          Elran Dvir
        2. SOLR-5972.patch
          40 kB
          Elran Dvir
        3. SOLR-5972_multivalue_docvalue.patch
          2 kB
          Lyubov Romanchuk

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                elrand Elran Dvir
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: