Solr
  1. Solr
  2. SOLR-1782

stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.4
    • Fix Version/s: 5.0
    • Component/s: search
    • Labels:
      None
    • Environment:

      reproduced on Win2k3 using 1.5.0-dev solr ($Id: CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)

      Description

      the StatsComponent assumes any field specified in the stats.facet param can be faceted using FieldCache.DEFAULT.getStringIndex. This can cause problems with a variety of field types, but in the case of multivalued fields it can either cause erroneous false stats when the number of distinct values is small, or it can cause ArrayIndexOutOfBoundsException when the number of distinct values is greater then the number of documents.

      New users interested in mixing stats & facets are encouraged to ignore the stats.facet param and instead combine stats.field with facet.pivot to achieve similar results more efficiently...

      https://cwiki.apache.org/confluence/display/solr/The+Stats+Component#TheStatsComponent-TheStatsComponentandFaceting

      1. index.rar
        83 kB
        Gerald DeConto
      2. SOLR-1782.2.patch
        12 kB
        Wojtek Piaseczny
      3. SOLR-1782.2013-01-07.patch
        16 kB
        David Christianson
      4. SOLR-1782.2013-04-10.patch
        21 kB
        Steven Bower
      5. SOLR-1782.patch
        21 kB
        Hoss Man
      6. SOLR-1782.patch
        15 kB
        David Christianson
      7. SOLR-1782.patch
        12 kB
        Wojtek Piaseczny
      8. SOLR-1782.solr-4.2.1.patch
        21 kB
        Patanachai Tangchaisin
      9. SOLR-1782.test.patch
        5 kB
        Hoss Man

        Issue Links

          Activity

          Gerald DeConto created issue -
          Gerald DeConto made changes -
          Field Original Value New Value
          Attachment index.rar [ 12436336 ]
          Hoss Man made changes -
          Summary unexpected statscomponent values stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields
          Description I wanted to understand the statscomponent better, so I setup a simple test index with a few thousand docs. In my schema I have:
          - an indexed multivalue sint field (StatsFacetField) that can contain values 0 thru 5 that I want to use as my stats.facet field.
          - an indexed single value sint field (ValueOfOneField) that will always contain the value 1 and that I want stats on for this test

          When I execute the following query:

          http://localhost:8080/solr/select?q=*:*&stats=true&stats.field=ValueOfOneField&stats.facet=StatsFacetField&rows=0&facet=on&facet.limit=10&facet.field=StatsFacetField

          For this situation (*:*) I was expecting that the statscomponent Count/Sum values for each possible value in StatsFacetField to match the facet values for StatsFacetField. They don't. Some are close (ie 204 vs 214) while others are way off (ie 230 vs 8000)
          the StatsComponent assumes any field specified in the stats.facet param can be faceted using FieldCache.DEFAULT.getStringIndex. This can cause problems with a variety of field types, but in the case of multivalued fields it can either cause erroneous false stats when the number of distinct values is small, or it can cause ArrayIndexOutOfBoundsException when the number of distinct values is greater then the number of documents.
          Hoss Man made changes -
          Attachment SOLR-1782.test.patch [ 12444168 ]
          Wojtek Piaseczny made changes -
          Attachment SOLR-1782.patch [ 12447500 ]
          Wojtek Piaseczny made changes -
          Attachment SOLR-1782.2.patch [ 12447630 ]
          Hoss Man made changes -
          Link This issue is duplicated by SOLR-3642 [ SOLR-3642 ]
          David Christianson made changes -
          Attachment SOLR-1782.patch [ 12561224 ]
          Mark Miller made changes -
          Assignee Hoss Man [ hossman ]
          David Christianson made changes -
          Attachment SOLR-1782.2013-01-07.patch [ 12563630 ]
          Hoss Man made changes -
          Attachment SOLR-1782.patch [ 12566874 ]
          Steven Bower made changes -
          Attachment SOLR-1782.2013-04-10.patch [ 12578232 ]
          Patanachai Tangchaisin made changes -
          Attachment SOLR-1782.solr-4.2.1.patch [ 12654153 ]
          Hoss Man made changes -
          Link This issue is duplicated by SOLR-6487 [ SOLR-6487 ]
          Hoss Man made changes -
          Link This issue relates to SOLR-6351 [ SOLR-6351 ]
          Hoss Man made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 5.0 [ 12327845 ]
          Resolution Won't Fix [ 2 ]
          Hoss Man made changes -
          Description the StatsComponent assumes any field specified in the stats.facet param can be faceted using FieldCache.DEFAULT.getStringIndex. This can cause problems with a variety of field types, but in the case of multivalued fields it can either cause erroneous false stats when the number of distinct values is small, or it can cause ArrayIndexOutOfBoundsException when the number of distinct values is greater then the number of documents. the StatsComponent assumes any field specified in the stats.facet param can be faceted using FieldCache.DEFAULT.getStringIndex. This can cause problems with a variety of field types, but in the case of multivalued fields it can either cause erroneous false stats when the number of distinct values is small, or it can cause ArrayIndexOutOfBoundsException when the number of distinct values is greater then the number of documents.
          ---
          New users interested in mixing stats & facets are encouraged to ignore the stats.facet param and instead combine stats.field with facet.pivot to achieve similar results more efficiently...

          https://cwiki.apache.org/confluence/display/solr/The+Stats+Component#TheStatsComponent-TheStatsComponentandFaceting
          Anshum Gupta made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Hoss Man
              Reporter:
              Gerald DeConto
            • Votes:
              9 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development