Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10318

Reuse HNSW graphs when merging segments?

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • New

    Description

      Currently when merging segments, the HNSW vectors format rebuilds the entire graph from scratch. In general, building these graphs is very expensive, and it'd be nice to optimize it in any way we can. I was wondering if during merge, we could choose the largest segment with no deletes, and load its HNSW graph into heap. Then we'd add vectors from the other segments to this graph, through the normal build process. This could cut down on the number of operations we need to perform when building the graph.

      This is just an early idea, I haven't run experiments to see if it would help. I'd guess that whether it helps would also depend on details of the MergePolicy.

      Attachments

        Activity

          People

            Unassigned Unassigned
            julietibs Julie Tibshirani
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: