[LUCENE-10383] Explore moving HNSW's NeighborQueue to a radix heap - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Lucene Fields:

New

Description

Now that julietibs improved merging via ~~LUCENE-10375~~, nightly benchmarks report the following two top CPU consumers:

18.45%        548638        org.apache.lucene.util.VectorUtil#dotProduct()
                              at org.apache.lucene.index.VectorSimilarityFunction$2#compare()
                              at org.apache.lucene.util.hnsw.HnswGraph#search()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
13.23%        393609        org.apache.lucene.util.LongHeap#upHeap()
                              at org.apache.lucene.util.LongHeap#push()
                              at org.apache.lucene.util.hnsw.NeighborQueue#add()
                              at org.apache.lucene.util.hnsw.HnswGraph#search()

Exploration at ~~LUCENE-7979~~ suggested that radix heaps perform better than binary heaps if the heap is large: queries with 16 clauses or more got faster while queries with fewer clauses would get slower. With a default beam width of 100, I'd be interested in seeing how HNSW indexing performs if we replace the current binary LongHeap with a radix heap.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Adrien Grand

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 20/Jan/22 08:23

Updated:: 28/Aug/22 16:34