Saw this once last month and again today, so not super flaky but still worth fixing:
I talked about this with Grant Henke when it last happened. The issue appears to be in the LongAccumulator used to track collisions in the data generator. Before the failure, the test logged this:
The assert code looks like this:
So the value of this LongAccumulator was zero even though there was one collision. Our thinking was that accumulators like these were updated asynchronously and so if we don't wait for the entire job to finish, we may not be getting their up-to-date values at assertion time.
We publish other LongAccumulators in kudu-spark, but AFAICT this is the only one that is asserted on. Nevertheless, it would be great if we could solve this in some generic way so that if someone wrote a test that used a different LongAccumulator, the race could be avoided.