Description
Working through SOLR-7605, I've confirmed that the underlying problem exists for regular field.facet situations, regardless of distrib mode, for Trie fields that have a non-zero precisionStep. this has only been reproduced when the RandomCodec was in use
The problem, when it manifests, is that faceting on a TrieIntField, using facet.mincount=0, causes the facet results to include three instances of facet the value "0" listed with a count of "0" – even though no document in the index contains this value at all...
[junit4] > <lst name="facet_fields"> [junit4] > <lst name="foo_ti"> [junit4] > <int name="20">32</int> ... [junit4] > <int name="50">21</int> [junit4] > <int name="0">0</int> [junit4] > <int name="0">0</int> [junit4] > <int name="0">0</int>
This is concerning for a few reasons:
- In the case of PivotFaceting, getting duplicate values back from a single shard like this triggers an assert in distributed queries and the request fails – even if asserts aren't enabled, the bogus "0" value can be propogated to clients if they ask for facet.pivot.mincount=0
- Client code expecting a single (value,count) pair for each value may equally be confused/broken by this response where the same "value" is returned multiple times
- w/o knowing the root cause, It seems very possible that other nonsense values may be getting returned – ie: if the error only happens with fields utilizing precisionStep, then it's likely related to the synthetic values used for faster range queries, and other synthetic values may be getting included with bogus counts
A Patch with a simple test that can demonstrate the bug fairly easily will be attached shortly
Attachments
Attachments
Issue Links
- incorporates
-
SOLR-7605 TestCloudPivotFacet failures: Must not add duplicate PivotFacetValue with redundent inner value
- Closed
- is broken by
-
LUCENE-6529 NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues
- Resolved