Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Most of these problems are actually documented on the wiki page, but here is my go at ideas for improving it, after reviewing this thing today.
- The external API should be made performant (e.g. some sort of paging for the stats.facet, vs returning ALL values)
- The code for multi-valued fields is clearly broken: it tries to use a combination of UninvertedField with a single-valued fieldcache for multivalued fields.
- The behavior for multi-valued fields could be unexpected: whether its UninvertedField or DocValues, these datastructures return the unique set of ordinals for the document. So I think it can be very misleading to return stats like 'sum' for multivalued fields.
- The stats returned should be implemented in ways that are fast. For example the string case returns min/max, but does this by looking up every single ordinal to term and using string.compareTo. the ords are themselves comparable, satisfying count/missing/min/max can all be done with 2 ord->term lookups per segment. These are also the only stats I think multi-valued numerics should return (see above).
- Things like accumulate(NamedList) appear to have scary runtime (I think this one is only used for merging distributed results?). They should not use the O(n) get() method over and over in accumulate() but instead do a single pass through the list.
Finally the code is pretty difficult to follow, and tests are inadequate for what all is going on here.
Attachments
Issue Links
- relates to
-
SOLR-5302 Analytics Component
- Closed