Nightly benchmarks have been upset for the past ~1.5 weeks because it looks like KnnVectorQuery is giving slightly different results on every run, even on an identical (deterministically constructed – single thread indexing, flush by doc count, SerialMergeSchedule, LogDocCountMergePolicy, etc.) index each night. It produces failures like this, which then abort the benchmark to help us catch any recent accidental bug that alters our precise top N search hits and scores:
At first I thought this might be expected because of the recent (awesome!!) improvements to HNSW, so I tried to simply "regold". But the regold did not "take", so it indeed looks like there is some non-determinism here.
I pinged email@example.com and he found this random seeding that is most likely the cause?
Can we somehow make this deterministic instead? Or maybe the nightly benchmarks could somehow pass something in to make results deterministic for benchmarking? Or ... we could also relax the benchmarks to accept non-determinism for KnnVectorQuery task?