I tested following schema with the same data in field and field2. Both reproduced the problem.
Ok good – that means the problem is not actually dependent on docValues or not – which was the most confusing and suprising part of your initial bug report.
Then I tried to find if it is value in cardinality which is causing the issue. I tried with 100000 to 120000 document and both the field returned cardinality but after increasing it to around 150000 it caused the exception.
ok, so somewhere arround 150K docs is the sweetspot.
Reviewing the code you posted, i noticed a few things:
1) every doc gets a unique value in the field you are computing stats on
2) your query matches all docs
3) because of how your uniqueKey is defined using composite routing keys ("!") every doc will wind up in the same shard.
the combination of all of these means that ulitmately what's causing problems is:
- building an HLL data struc using the max possible log2m & regwidth opts (that's what cardinality=1.0 does)
- adding ~150K unique(ish) hash values to the HLL
- serializing the HLL to bytes (which is what happens in a distributed query to coordinate)
based on that, i was able to create a unit test that demonstrates the same underlying ArrayIndexOutOfBoundsException which i'll attach shortly – still haven't dug in enough to udnerstand hte cause.
(NOTE: since Solr 5.2.1, we've forked the HLL and imported it directly into the org.apache.solr.util.hll package, but the basic structure/functionality of the various classes is still the same)