While arguably the behavior I started looking at is benign-but-annoying (if inefficient), there is bona-fide badness here. If you specify facet.query=blah twice in distributed mode, the result is twice the count it should be.
Try this: URL on the test docs, simple 2-shard setup:
The only non-zero entry is
Now duplicate the field by adding another copy of:
the count goes to 22
Admittedly this is pilot error, but still it's incorrect.
I looked at de-duping in the facet component only, but that ran into weird null pointer exceptions. Given that I slightly favored deduping at a high level anyway, I'm not willing to pursue that avenue unless there's a really good counter-argument.
The behavior is different because the distributed case is, of course, a separate code path. Part of that path is FacetComponent.parse (roughly line 1000 in trunk). That method collects the facet.field parameters into a LinkedHashMap, so multiple entries are weeded out. In the case where we specify the field twice, the params are still passed to the shards and calculated twice. And returned twice. It just happens that the facets member var (which collected the names of the fields) only has one entry for the multiply-specified field since it's a HashMap. So it only asks for the facet counts from the response packet once for each shard. Since they're identical, it's OK.
However, take rangeFacets, roughly line 981 in FacetComponent is defined as a SimpleOrderedMap. Which checks to see if there is already an entry for the field in question... and if there is adds to it.
I think that this adds weight to the somewhat invasive change I first tried. I don't know how many of these are hanging around. Based on them being SimpleOrderedMaps, I'm guessing at least date facets, pivot facets and interval facets at least have problems.
So rather than find them one-by-one I'm inclined to do the deduping at a higher level.
This is easy to demonstrate in the distributed tests, just hack TestDistributedSearch to duplicate one of the facet.range (or other facet types that use SimpleOrderedMaps) in the queries . Tests start failing.
I'll probably attach a new version of the patch tonight (running tests now). It's just like the old one except it adds duplicated fields to the distributed facets test (which will fail without the patch).
I spent some time trying to move the dedupe code to FacetComponent, but it's more tangled than I think is worth the effort. It's not difficult
to dedupe the parameters, but they're passed through in a SolrParams object. So far so good. But somehow that generates a null pointer exception down the road. I think that the problem here is that there are some assumptions about how many entries should be there based on the original parameters in the request. Or something like that. Deduping on the highest level seems to avoid this altogether and I'm very afraid it's an N+1 problem.
Let me know if anyone disagrees violently. Otherwise I'll commit sometime this weekend assuming tests pass.