Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-4499

StatsComponent could use some serious TLC

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:


      Most of these problems are actually documented on the wiki page, but here is my go at ideas for improving it, after reviewing this thing today.

      1. The external API should be made performant (e.g. some sort of paging for the stats.facet, vs returning ALL values)
      2. The code for multi-valued fields is clearly broken: it tries to use a combination of UninvertedField with a single-valued fieldcache for multivalued fields.
      3. The behavior for multi-valued fields could be unexpected: whether its UninvertedField or DocValues, these datastructures return the unique set of ordinals for the document. So I think it can be very misleading to return stats like 'sum' for multivalued fields.
      4. The stats returned should be implemented in ways that are fast. For example the string case returns min/max, but does this by looking up every single ordinal to term and using string.compareTo. the ords are themselves comparable, satisfying count/missing/min/max can all be done with 2 ord->term lookups per segment. These are also the only stats I think multi-valued numerics should return (see above).
      5. Things like accumulate(NamedList) appear to have scary runtime (I think this one is only used for merging distributed results?). They should not use the O(n) get() method over and over in accumulate() but instead do a single pass through the list.

      Finally the code is pretty difficult to follow, and tests are inadequate for what all is going on here.


        Issue Links



            • Assignee:
              rcmuir Robert Muir


              • Created:

                Issue deployment