Details
Description
While profiling a large collection (multi-sharded billions of documents), I found that a fast (5-10ms query) which had no matches would take 20-30 seconds when doing facets even when facet.mincount=1
Profiling made it apparent that with facet.method=fcs 99% of the time was spent here.
queue.udpateTop gets called numOfSegments*numTerms, the worst case when every term is in every segment. This formula doesn't take into account whether or not any of the terms have a positive count with respect to the docset.
These optimizations are aimed to do two things:
- When mincount>0 don't include segments which all terms have zero counts. This should significantly speed up processing when terms are high cardinality and the matching docset is small
- FIXED TODO optimization: when mincount>0 move segment position the next non zero term value.
both of these changes will minimize the number of called needed to the slow updateTop call.
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
Thanks, Dave. I think I've been marking issues as closed. I'll keep this in mind going forward.