Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9583

How should we expose VectorValues.RandomAccess?

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 9.0
    • None
    • None
    • None
    • New

    Description

      In the newly-added VectorValues API, we have a RandomAccess sub-interface. jtibshirani pointed out this is not needed by some vector-indexing strategies which can operate solely using a forward-iterator (it is needed by HNSW), and so in the interest of simplifying the public API we should not expose this internal detail (which by the way surfaces internal ordinals that are somewhat uninteresting outside the random access API).

      I looked into how to move this inside the HNSW-specific code and remembered that we do also currently make use of the RA API when merging vector fields over sorted indexes. Without it, we would need to load all vectors into RAM while flushing/merging, as we currently do in BinaryDocValuesWriter.BinaryDVs. I wonder if it's worth paying this cost for the simpler API.

      Another thing I noticed while reviewing this is that I moved the KNN search(float[] target, int topK, int fanout) method from VectorValues to VectorValues.RandomAccess. This I think we could move back, and handle the HNSW requirements for search elsewhere. I wonder if that would alleviate the major concern here?

      Attachments

        Activity

          People

            julietibs Julie Tibshirani
            sokolov Michael Sokolov
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h
                2h