The code seems to be in working state for Solr 4.8 and 4.9 for Field Cache faceting (single- and multi-value) and DocValue faceting (single- and multi-value) for Strings. It really needs testing by someone else than me, so that the validity of the responses and the speed upc can be verified. I am willing to port this to trunk if a committer is willing to work on getting the patch into Solr, but until then I will stick to stable versions as that is what we use locally.
Sparse faceting introduces some architectural changes that needs to be addressed.
1) The core sparse counting itself is surprisingly non-invasive. It could be a standalone patch. However, this only works really well in a non-distributed setting and has less effect in a distributed one.
2) Avoiding re-allocation of counter structures by maintaining a pool of structures lowers the minimum faceting time and lowers the need for GC. It is lowered quite a lot, I might add, as faceting is normally one of the big GC-triggers, so I would strongly recommend this feature.
Such a pool is very much like a cache and as such must play nice with index updates and other caching structures in Solr. The current state of the counter pool is that it works well, but is implemented independently of the usual caches. It is probably better to latch on to the standard caching mechanisms in Solr, but I have no experience with that part of the code base. The code for the pool itself is fairly simple and can be implemented as an experimental feature by setting the default pool size to 0.
3) Fast distributed faceting is about speeding up the secondary phase of distrubuted faceting: The fine count for specific terms. This is achieved by replacing the normal search-for-each-requested-term approach by a full faceted count approach, where the counts for all terms are calculated but only the counts for the specified terms are returned. This requires either a separation of counting and extraction in stock Solr code or an extension to the count-extract code block so that is can handle both the case where top-X terms are wanted and where specific terms are requested. Currently the sparse code use the latter approach, but the former is a much cleaner architecture. Both solutions does require some re-factoring and are a bit tricky to get right.
4) A further speed up for distributed faceting can be achieved by upgrading the sparse pools to real caches, so that they contain a mix of empty counters (to avoid GC) and filled counters (caching of counts). As the counting part of the second phase is always exactly the same as for the first phase, cache hit-rate should be very high with distribution. This is not currently implemented and is a bit tricky to get to work well.
I recommend that this improvement is postponed, in order to keep the current patch manageable.
The sparse principle would also work for Field Cache per Segment faceting of Strings, but no code has been written for this yet. I recommend that this is also postponed, to get things moving.