TL;DR: I think this test needs some some small deltas when doing float/double comparisons – but i want to think about it a bit more.
FYI: I couldn't immediately reproduce that seed at r1686892 on 5x, the first time i to beast it made it 5 iterations before it failed with the exact same numeric values.
What this test is doing is executing a pivot fact request with stats enabled, then executing a "drill down" queries on each pivot field value (by adding an fq) and confirming the numResults matches the previously returned pivot count, and that the top level stats of the drill down query equal the previously returned per-pivot stats.
I remember when writing this test that i expected i would need to do some 3 arg asserts on the floating point stat values to account for a small epsilon that might be introduced in the floating point match because of the different order of computation – but when i ran the initial straw man test i never got any errors.
I also remember working through the problem in my head for a few minutes to try and figure out why it wasn't failing – and figuring that regardless of whether it was top level stats or pivot stats, the same code was going to accumulate the (per-index) values in the same order (the test doesn't do any index updates concurrent to searching for obvious reasons) and i just left it alone.
looking at these failures now, and adding some extra logging and thinking about it a bit more, i think we've just been getting lucky that this hasn't failed yet.
I can think of 2 ways for this type of validation to fail on exact floating point comparison:
- if different replicas get hit by the top level diff queries, or a diff replica was used to refine a pivot then is used by the verification query, and if those diff replicas have slightly diff segments (due ot recovery, or retry or whatever so the accumulation of the "sum" of a field for all the matching docs adds the numbers up in a diff order.
- if there are more then 2 shards, and the 2 top level queries (or the refinement of a pivot query) gets responses from the shards in a diff order) then the sum of the aggregate "sum" involves adding numbers in a diff order.
if the "sum" is diff, then the "mean" might be diff as well (depending on precision loss due to dividing by the count)
But most times, we deal with numeric values that either don't lose precision when added up, or lose precision the same amount in both ways.