LUCENE-3097 which is about "post group faceting", while this issue is LUCENE-3079. I assume you meant the latter, but want to confirm . You also write LUCENE-2309 which is about decoupling IW from Analyzers. Are you perhaps referring to a Solr issue, or a different Lucene issue? If so can you please let me know which one?
This is a great test, and it matches more or less the test we've been running. Is it in 'benchmark' form? Can you post it on this issue so I can try the same?
What do you mean by "top 5 facets/tags"? If I were to speak of dimensions, where a dimensions is like "tags", "authors", "date", then do you mean you've requested to count 5 dimensions, or you indexed just one dimension (i.e. one "root") and requested to fetch the top-5 results for it? I assume it's the latter, but again, confirming my understanding.
So assuming I understood correctly the terminology and test setup, you execute one query which matches 50% of the documents and ask to count the top-5 facets under a single "root"/"dimension", and record the time as 'first facet request'. And then you execute it 4-5 additional times, and record 'best of 5 requests'. Do I understand it correctly?
One difference between the two approaches, assuming you're referring to a faceting approach that uses the FieldCache is that by default, the faceting approach here reads everything from disk. So it would be interesting to run w/ the facets-in-memory feature.
I don't know how to relate to the memory usage – on the last test it consumed 50% less than the other approach, on the first it consumed nearly the same and on the second test it consumed 150% more. This is odd. Do you trust this measurement?
The 'first facet request' result is not surprising, because it takes time to warm up the FieldCache (assuming that's what you use).
I am interested in the memory observed for indexing because that too seems fluctuating? I.e., in the second test the difference is nearly x20 more, which is weird.
Also, the difference in indexing time is interesting too, as it too is not very consistent. And I find the x2 factor suspicious - would like to understand it better. Since trunk reports to improve indexing speed by a large factor (nearly 200%), I think it will be wise if we wait with this comparison until I bring the patch up w/ trunk.
I like it that you test the default behavior. I think it's very important that we have the greatest out-of-the-box experience. Since the two approaches read from disk/memory, I first would like to test the in-memory facets using this approach, so we can at least compare the same thing. I know that trunk plays some role here (definitely at indexing time), so we can focus on search time for now.
This is great stuff Toke !