Multiple performance enhancements to Solr String faceting.
- Sparse counters, switching the constant time overhead of extracting top-X terms with time overhead linear to result set size
- Counter re-use for reduced garbage collection and lower per-call overhead
- Optional counter packing, trading speed for space
- Improved distribution count logic, greatly improving the performance of distributed faceting
- In-segment threaded faceting
- Regexp based white- and black-listing of facet terms
- Heuristic faceting for large result sets
Currently implemented for Solr 4.10. Source, detailed description and directly usable WAR at http://tokee.github.io/lucene-solr/
This project has grown beyond a simple patch and will require a fair amount of co-operation with a committer to get into Solr. Splitting into smaller issues is a possibility.