Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-4499

StatsComponent could use some serious TLC

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Most of these problems are actually documented on the wiki page, but here is my go at ideas for improving it, after reviewing this thing today.

      1. The external API should be made performant (e.g. some sort of paging for the stats.facet, vs returning ALL values)
      2. The code for multi-valued fields is clearly broken: it tries to use a combination of UninvertedField with a single-valued fieldcache for multivalued fields.
      3. The behavior for multi-valued fields could be unexpected: whether its UninvertedField or DocValues, these datastructures return the unique set of ordinals for the document. So I think it can be very misleading to return stats like 'sum' for multivalued fields.
      4. The stats returned should be implemented in ways that are fast. For example the string case returns min/max, but does this by looking up every single ordinal to term and using string.compareTo. the ords are themselves comparable, satisfying count/missing/min/max can all be done with 2 ord->term lookups per segment. These are also the only stats I think multi-valued numerics should return (see above).
      5. Things like accumulate(NamedList) appear to have scary runtime (I think this one is only used for merging distributed results?). They should not use the O(n) get() method over and over in accumulate() but instead do a single pass through the list.

      Finally the code is pretty difficult to follow, and tests are inadequate for what all is going on here.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rcmuir Robert Muir
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: