Wojtek & David...
I spent some time today trying to look over the latest patch (
SOLR-1782.2013-01-07.patch ) and had some trouble wrapping my head arround it.
In particular i think the crux of my confusion stems from this comment from Wojtek...
New patch uses both UninvertedField and FieldCache.DocTermsIndex for multi-valued facet fields
...which doens't really make sense to me.
If a field is multivalued, then (unless i'm missunderstanding something) a DocTermsIndex won't make sense for that field at all. At best it's wasted RAM to build up the DocTermsIndex when UnInvertedField is going to be used instead – but in this patch, as part of the "if (null == uif)" sprinkled throughout FieldFacetStats, it looks like term ordinals from the UnInvertedField.getTermNumbers method are then being used to look Term values from the DocTermsIndex ... i don't understand that at all.
Hopefully this is just a minor bug in the patch, and the call to "si.lookup" in the UIF block(s) of FieldFacetStats.facet & FieldFacetStats.facetTermNum are ment to be calls to some similar method in UnivertedField, but it's not entirely clear.
If that is the case, then it suggests to me that a better way to organize this code so that it's more clear what's going on (and to save all that wasted RAM used by the DocTermsIndex when going down the UIF code path!) would be to refactor an abstract base class out of FieldFacetStats defining the API contract and then have two concrete impls: one using DocTermsIndex provided in teh constructor, and one using an UnInvertedField provided in it's constructor.
I'm also a little perplexed by the "UnInvertedField.getStats" method which is modified in this patch, but seemingly only as an aside so it will still compile because of the new param added to the FieldFacetStats constructor. At a glance, I'm not clear on wether this method is even used anywhere, but that makes it seem all the more suspicious: it appears that a lot of the logic in getTermNumbers was cut/paste from this method – should these methods be refactored to eliminate that duplication? can UIF.getStats just be deprecated & removed? Would be make more sense instead of adding some of these new methods to just "fix" UIF.getStats so that it recognizes when another UIF instance is needed for the facet field(s)?
Lastly: I noticed that while the patch did include tests, it didn't include the original tests I wrote when this issue was filled (see
SOLR-1782.test.patch). When I attempted to merge those tests in, I got a failure in testMicroStatsWithMultiValuedFacetField (even after i fixed the obvious case sensitivity problem in the original test) – hopefully this is just a demonstration of the problem i mentioned above about mixing ords from the DocTermsIndex and the UnInvertedField, but it may be a symptom of a diff problem.
unified patch will all tests attached