Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10383

Explore moving HNSW's NeighborQueue to a radix heap

Details

    • Task
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • New

    Description

      Now that julietibs improved merging via LUCENE-10375, nightly benchmarks report the following two top CPU consumers:

      18.45%        548638        org.apache.lucene.util.VectorUtil#dotProduct()
                                    at org.apache.lucene.index.VectorSimilarityFunction$2#compare()
                                    at org.apache.lucene.util.hnsw.HnswGraph#search()
                                    at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
      13.23%        393609        org.apache.lucene.util.LongHeap#upHeap()
                                    at org.apache.lucene.util.LongHeap#push()
                                    at org.apache.lucene.util.hnsw.NeighborQueue#add()
                                    at org.apache.lucene.util.hnsw.HnswGraph#search()
      

      Exploration at LUCENE-7979 suggested that radix heaps perform better than binary heaps if the heap is large: queries with 16 clauses or more got faster while queries with fewer clauses would get slower. With a default beam width of 100, I'd be interested in seeing how HNSW indexing performs if we replace the current binary LongHeap with a radix heap.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jpountz Adrien Grand
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: