Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10592

Should we build HNSW graph on the fly during indexing

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 9.4
    • None
    • None
    • New

    Description

      Currently, when we index vectors for KnnVectorField, we buffer those vectors in memory and on flush during a segment construction we build an HNSW graph.  As building an HNSW graph is very expensive, this makes flush operation take a lot of time. This also makes overall indexing performance quite unpredictable (as the number of flushes are defined by memory used, and the presence of concurrent searches), e.g. some indexing operations return almost instantly while others that trigger flush take a lot of time. 

      Building an HNSW graph on the fly as we index vectors allows to avoid this problem, and spread a load of HNSW graph construction evenly during indexing.

      This will also supersede LUCENE-10194

      Attachments

        Activity

          People

            mayya Mayya Sharipova
            mayya Mayya Sharipova
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 8h
                8h