Details
-
Task
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
New
Description
Now that julietibs improved merging via LUCENE-10375, nightly benchmarks report the following two top CPU consumers:
18.45% 548638 org.apache.lucene.util.VectorUtil#dotProduct() at org.apache.lucene.index.VectorSimilarityFunction$2#compare() at org.apache.lucene.util.hnsw.HnswGraph#search() at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() 13.23% 393609 org.apache.lucene.util.LongHeap#upHeap() at org.apache.lucene.util.LongHeap#push() at org.apache.lucene.util.hnsw.NeighborQueue#add() at org.apache.lucene.util.hnsw.HnswGraph#search()
Exploration at LUCENE-7979 suggested that radix heaps perform better than binary heaps if the heap is large: queries with 16 clauses or more got faster while queries with fewer clauses would get slower. With a default beam width of 100, I'd be interested in seeing how HNSW indexing performs if we replace the current binary LongHeap with a radix heap.