|
We are seeing faceting on multi-valued fields as a significant performance problem, so we'd very much like to see something of this sort.
Yonik,
This looks great! I'd like to contribute (unit test, etc.) to move forward. Before I write unit tests, I have a couple of questions:
I've finished this implementation and am cleaning it up for contribution.
In the meantime, I'm attaching the results of some performance tests. Some further results on a bigger index to show some practical limits.
This table (JIRA markup format) shows the performance and memory characteristics of facet requests on a 50M document index, for different fields and different numbers of documents being counted in the base query.
The "profile" of the faceted field is encoded in it's name. For example, the field f1000_5_t has 1000 unique values across the whole index and between 0 and 5 values per document. It took 35 ms to facet on this field when the base query matched 100,000 documents. Test Hardware: Commodity PC How amazing! facet_performance.html shows great improvement on both memory and qps. Great job, Yonik!
Attaching patch with tests.
This is well tested, so I'll probably commit relatively soon. update:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is completely untested code, and is still missing the solr interface + caching.
The approach is described in the comments (cut-n-pasted here).
Any thoughts or comments on the approach?
I may not have time to immediately work on this (fix the bugs, add tests, hook up to solr, add caching of un-inverted field, etc), so additional contributions in this direction are welcome!